lp - Literate Programming*
==========================
 
Chr. Clemens Lahme (clemens.lahme@techinvest.li)
 
2023-12-20
 
(* because who really needs an effing line printer.)
 
Literate programming in 2023, not invented here syndrome to its fullest.
 
So we (which is just me and myself) want to start from scratch. Why, because why not, why follow someone elses ideas
and use cases?
 
It shouldn't be so hard either, with today's modern languages and the basic idea (embed code in text and then just
write a book or paper) floating around since the 80s. So what are the main ideas and tools to use here?
 
Scripting languages that easily transform text files. So regular expressions and in particular Ruby. So skipping the
whole flex lex bison grammar garbage.
 
On the text front, take notice of md - markdown. Also formatting in HTML _and_ PDF, use of TeX, asciidoc if it brings
anything to the table, expecially on the font handling side, and not taking them, if it just pollutes and complicates
the whole process, like the whole XML world and unnecessary garbage escape sequences.
 
The main point maybe is just: do the opposite of embedding everything inside XML structure.
 
Instead, everything should be already readable just in its original format, and then with tools the text can be just
beautified as a bonus and on top of it. After all, you don't need any markdown tool at all to read a markdown file!
 
Last not least, after doing the first iteration, it became clear that Rails
philosophy is also an inspiration. Everything, especially with the text
formatting, should just be the way it is to be expected, and then actually
nothing is needed on top of it. That means head lines in text are just
formatting without any reformatting in mind, so far not a single marker or
escape code is needed too. Scripts have always used EOF or EOT signs for file
redirection, and can be used here just as well and apropriately. All the 'magic'
is then delegated to processing in separated scripts, but without any surprises
or actual real magic. At least, that's how it should appear and work in
practice.
 
Table of Contents
-----------------
  1. First Two Rules
  2. Down to the Metal
  3. First Example
  4. First Release
  5. Statistic
  6. Extract Named Files
  7. Evolving Scripts
  8. Testing
  9. Adding A Table of Contents
 
1. First Two Rules
------------------
 
So here is already the second rule. Use #! to start the program code, and anything without whitespace at the beginning
of a line and starting with /^exit[\(\s/ should indicate the end of program code. Everything before and afterwards is
just text!
 
Now back to the first rule. Text should not be polluted with escape or markup language. Even less than markdown. So we
have not to invent our own markdown language or specification, we just have to define a consistent and fixed rule for
everything that can be reformatted in another format.
 
So in this case the main headline is underlined by equal signes of the same length as the headline itself. On the
second level, headlines are just underlined by the minus sign '-'. Should we allow sub chapters further down? Maybe,
but let's keep it open for now.
 
2. Down to the Metal
--------------------
 
So now down to the metal. How should our program look alike, how should everything start? We name the program 'lp', for
'literate programming'. What is the first feature, we want to implement? Reformat everything in HTML? Reformat just the
text itself? Like reformat the lines, or even left right alignement?
 
Or just split out the program from the commentary?
 
So how would we use the program?
 
cd myproject
lp
 
Then what should happen? Should the program already be executed? No, that should be happening with:
 
lp run
 
And just 'lp'? What should that do? Maybe we need a second project, to use this project with? Maybe back to the
'afaf' project (don't ask, I am from the Cologne area)?
 
Yes, maybe we could still leave the original code as it is. Like, you actually can have both, pure text files, and also
pure code files, plus a mix of them. But then what is the purpose and the extra benefit of 'lp' itself? Maybe just the
help for me, to use a journal mode for myself, while programming?!
 
So then how to intermix commentary inside of existing files? Well, in Ruby you just use =begin and =end inside of it,
not!?
 
Yes, and then later we extract the commentary to make it available in HTML, or strip it out of the code and insert it
into the PDF documentary.
 
So which fonts do we use? Monospace I guess. Why not. So we use TeX to create PDF files?
 
3. First Example
----------------
 
So we started documenting the afaf project, or more kind of brainstorming, and we have our first use case.
 
lp
 
This default command should just look in doc/ for the index.txt file, and create a corresponding index.html file, with
URLs also being converted.
 
#! /usr/bin/env ruby2
 
def process_text_file( input_filename )
  file = File.open( input_filename, "r" )
  content = file.read()
  file.close
 
  # Extract named scripts.
  lines = content.split( /\n/ )
  script_filename = nil
  script = ""
  mode = nil
  lines.each do |line|
    if line =~ /<<\s*EO(F|T)\s*$/
      mode = "w"
      if line =~ />>/
        mode = "a"
      end
      script_filename = line.sub( /^\s*cat\s+>>?\s+/, '' )
      script_filename.sub!( /\s.*$/, '' )
      script = ""
    elsif script_filename && (line =~ /^\s*EO(F|T)\s*$/)
      file = File.open( script_filename, mode )
      file.write( script )
      file.close
      system( "chmod 700 #{script_filename}" )
      puts script_filename
      script_filename = nil
      next
    elsif script_filename
      script << line
      script << "\n"
    end
  end
 
  output =  ""
  output << "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">"
  output << "<html>\n"
  output << "<head>\n"
  output << "<title>lp: #{File.realdirpath( '.' ).sub( /^.*\//, '' )}</title>\n"
  output << "</head>\n"
  output << "<body>\n"
  output << "<tt>\n"
  content.gsub!( /&/m, "&amp;" )
  content.gsub!( /</m, "&lt;" )
  content.gsub!( />/m, "&gt;" )
  content.gsub!( /\n/m, "<br />\n" )
  content.gsub!( /<br \/>\n<br \/>\n/m, "<br />\n&nbsp;<br />\n" )
 
  # Handle links, but treat local files just relative to the doc directory if necessary,
  # without the preceeding 'file://'.
  content.gsub!( /(^|\s)(file):\/\/([^\s<]+)(\s|<)/m, "<a href=\"\\3\">\\2://\\3</a>\\4" )
  content.gsub!( /(^|\s)(https?):\/\/([^\s<]+)(\s|<)/m, "<a href=\"\\2://\\3\">\\2://\\3</a>\\4" )
 
  # Repeated spaces have to be respected absolutely.
  content.gsub!( /  /m, '&nbsp;&nbsp;' )
 
  output << content
  output << "</tt>\n"
  output << "</body>\n"
  output << "</html>\n"
 
  output_filename = input_filename.sub( /\.txt$/, '.html' )
  file = File.open( output_filename, "w" )
  file.write( output )
  file.close
  system( "chmod 600 '#{output_filename}'" )
  puts output_filename
  system "$HOME/machine/src/rb/lp/bin/add_toc.rb"
end
 
files = [ "./doc/index.txt" ]
if ARGV.length > 0
  files = ARGV
end
files.each do |input_filename|
  process_text_file( input_filename )
end
 
exit( 0 )
 
But this code inside this text document must also be extracted. So for this we have some bootstrap code
outside of this text document. Of course, we must later also incorporate this separate program into this document:
 
./bin/bootstrap.rb
 
The text parser should recognize, that this line above has a corresponding local file, which is a program.
So it should include a link, instead of the whole file, maybe. Or extract the main comment to describe this
program.
 
4. First Release
----------------
 
The first release (commit) had the following statistic.
 
Lines of code:    36    28.57%
Lines of text:    90    71.43%
Total            126   100.00%
 
5. Statistic
------------
 
For the statistic we use the following code:
 
cat > ./bin/statistic.sh <<EOF
#! /bin/bash
 
TEXTLINES=$(cat doc/index.txt | grep -E -v '^\s*$' | wc -l)
CODELINES=$(cat bin/lp.rb bin/bootstrap.rb | grep -E -v '^\s*$' | grep -E -v '^#' | wc -l)
EOFLINES=0
CODELINES=$(expr $CODELINES '+' $EOFLINES)
TEXTLINES=$(expr $TEXTLINES '-' $CODELINES)
TOTALLINES=$(expr $TEXTLINES '+' $CODELINES)
CODEPERCENTAGE=$(ruby -e "puts ${CODELINES}.0 / ${TOTALLINES}.0 * 100.0")
TEXTPERCENTAGE=$(ruby -e "puts ${TEXTLINES}.0 / ${TOTALLINES}.0 * 100.0")
printf "Lines of code: %8d   %6.2f\n" ${CODELINES} ${CODEPERCENTAGE}
printf "Lines of text: %8d   %6.2f\n" ${TEXTLINES} ${TEXTPERCENTAGE}
printf "Total:         %8d   100.00\n" ${TOTALLINES}
EOF
 
6. Extract Named Files
----------------------
 
As with the statistic.sh script example, we now need to create named script or
program files, in addition to the default project script, which is named as a
Ruby file in the bin directory, with the same name as the project directory,
e.g. ./bin/afaf.rb.
 
OK, we added that to the main script above searching lines for EOT or EOF.
 
7. Evolving Scripts
-------------------
 
Now, in order to preserve the development of a program and make it easier to
understand and follow its logic, how can we in later steps adapt such scripts?
 
One idea is to add place holders and later insert new code at these insertion
points.
Or we just add simple lines like 'source another_script.sh' into the code and
separate the logic that way, accompanying with another section of text.
For Ruby use 'load "my/file.rb"' to add extra code later on etc.
 
8. Testing
----------
 
First is the resulting HTML code compliant? Use 'tidy' for that. And use
the 'check' rule in the Makefile. Yes, we didn't mention the make file, but
we use one to invoke 'bootstrap.rb' and create the 'lp.rb' file as well as the
HTML page 'index.html'.
 
So now we add the test code into the test script, which will be invoked from the
'check' rule in the Makefile (not included in this document).
 
cat > ./bin/test_lp.sh <<EOF
#! /bin/bash
 
set -e
 
type tidy || { echo "ERROR: tidy command is not available."; exit 2; }
tidy ./doc/index.html 2>&1 | grep "No warnings or errors were found."
 
echo "SUCCESS: $0 - $?."
EOF
 
9. Adding A Table of Contents
-----------------------------
 
Before the first sub headline, and only for HTML, we want to add automatically a
table of contents.
 
For this we read the text file, and parse for sub headers. These are defines as
 
1. An empty line.
2. A line of text.
3. A line of '-' signs of the same length as the line above.
4. A final empty line.
 
And we number the chapters through. We work directly on the HTML output, as the
text file will not be changed anyway.
 
For anchors of this headings, we can just number them through.
 
cat > ./bin/add_toc.rb <<EOT
#! /usr/bin/env ruby2
 
file = File.open( "./doc/index.html", "r" )
content = file.read()
file.close
 
lines = content.split( /\n/ )
status = nil
headline = nil
toc = []
indexes = []
lines.each_with_index do |line, index|
  if status == nil
    if line =~ /^&nbsp;<br \/>$/
      status = "empty line"
      #puts "#{index}: #{status}"
      next
    end
  elsif status == "empty line"
    if line =~ /^&nbsp;<br \/>$/
      next
    elsif line =~ /^\s*-+\s*<br \/>$/
      status = nil
      next
    else
      headline = line.sub( /^\s+/, '' )
      headline.sub!( /\s*<br \/>$/, '' )
      #puts "#{index}: #{headline}"
      status = headline.length
      next
    end
  elsif status == "dashed line"
    if line =~ /^&nbsp;<br \/>$/
      # We should compare the length of the dashed line. But hey, we call it
      # Bingo anyway.
      toc << [ headline, index ]
      status = "empty line"
      #puts "Bingo: #{toc.inspect}"
      next
    end
  else
    if line =~ /^\s*-+\s*<br \/>$/
      status = "dashed line"
      #puts status
      next
    elsif line =~ /^&nbsp;<br \/>$/
      status = "empty line"
      #puts "#{index}: #{status}"
      next
    else
      status = nil
      next
    end
  end
end
 
if toc.length > 0
  toc_content = ""
  toc_content << "&nbsp;<br />\n"
  toc_content << "Table of Contents<br />\n"
  toc_content << "-----------------<br />\n"
  toc_content << "<ol>\n"
  toc.each_with_index do |row, index|
    toc_content << "<li><a href=\"#headline#{index + 1}\">#{row[ 0 ]}</a></li>\n"
  end
  toc_content << "</ol>\n"
 
  # Insert ToC.
  output = ""
  headline_index = 0
  lines.each_with_index do |line, index|
    if index == toc[ 0 ][ 1 ] - 3
      output << toc_content
    end
    if (headline_index < toc.length) && (index == toc[ headline_index ][ 1 ] - 2)
      output << "<a name=\"headline#{headline_index + 1}\" />#{headline_index + 1}. "
    end
    if (headline_index < toc.length) && (index == toc[ headline_index ][ 1 ] - 1)
      output << '-' * ((headline_index + 1).to_s.length + 2)
      headline_index += 1
    end
    output << line
    output << "\n"
  end
  file = File.open( "./doc/index.html", "w" )
  file.write( output )
  file.close
  puts "./doc/index.html"
end
EOT
 
Now we need to make this script to be run everytime the an HTML file is generated.