lp: lp

lp - Literate Programming* ========================== Chr. Clemens Lahme (clemens.lahme@techinvest.li) 2023-12-20 (* because who really needs an effing line printer.) Literate programming in 2023, not invented here syndrome to its fullest. So we (which is just me and myself) want to start from scratch. Why, because why not, why follow someone elses ideas and use cases? It shouldn't be so hard either, with today's modern languages and the basic idea (embed code in text and then just write a book or paper) floating around since the 80s. So what are the main ideas and tools to use here? Scripting languages that easily transform text files. So regular expressions and in particular Ruby. So skipping the whole flex lex bison grammar garbage. On the text front, take notice of md - markdown. Also formatting in HTML _and_ PDF, use of TeX, asciidoc if it brings anything to the table, expecially on the font handling side, and not taking them, if it just pollutes and complicates the whole process, like the whole XML world and unnecessary garbage escape sequences. The main point maybe is just: do the opposite of embedding everything inside XML structure. Instead, everything should be already readable just in its original format, and then with tools the text can be just beautified as a bonus and on top of it. After all, you don't need any markdown tool at all to read a markdown file! Last not least, after doing the first iteration, it became clear that Rails philosophy is also an inspiration. Everything, especially with the text formatting, should just be the way it is to be expected, and then actually nothing is needed on top of it. That means head lines in text are just formatting without any reformatting in mind, so far not a single marker or escape code is needed too. Scripts have always used EOF or EOT signs for file redirection, and can be used here just as well and apropriately. All the 'magic' is then delegated to processing in separated scripts, but without any surprises or actual real magic. At least, that's how it should appear and work in practice. Table of Contents -----------------

First Two Rules
Down to the Metal
First Example
First Release
Statistic
Extract Named Files
Evolving Scripts
Testing
Adding a Table of Contents
Todo Items
Special Case Without index.txt
CSS
Inlining Images
Slides
Installation
Download
Usage

1. First Two Rules ------------------ So here is already the second rule. Use #! to start the program code, and anything without whitespace at the beginning of a line and starting with /^exit[\(\s/ should indicate the end of program code. Everything before and afterwards is just text! Now back to the first rule. Text should not be polluted with escape or markup language. Even less than markdown. So we have not to invent our own markdown language or specification, we just have to define a consistent and fixed rule for everything that can be reformatted in another format. So in this case the main headline is underlined by equal signes of the same length as the headline itself. On the second level, headlines are just underlined by the minus sign '-'. Should we allow sub chapters further down? Maybe, but let's keep it open for now. 2. Down to the Metal -------------------- So now down to the metal. How should our program look alike, how should everything start? We name the program 'lp', for 'literate programming'. What is the first feature, we want to implement? Reformat everything in HTML? Reformat just the text itself? Like reformat the lines, or even left right alignement? Or just split out the program from the commentary? So how would we use the program? cd myproject lp Then what should happen? Should the program already be executed? No, that should be happening with: lp run And just 'lp'? What should that do? Maybe we need a second project, to use this project with? Maybe back to the 'afaf' project (don't ask, I am from the Cologne area)? Yes, maybe we could still leave the original code as it is. Like, you actually can have both, pure text files, and also pure code files, plus a mix of them. But then what is the purpose and the extra benefit of 'lp' itself? Maybe just the help for me, to use a journal mode for myself, while programming?! So then how to intermix commentary inside of existing files? Well, in Ruby you just use =begin and =end inside of it, not!? Yes, and then later we extract the commentary to make it available in HTML, or strip it out of the code and insert it into the PDF documentary. So which fonts do we use? Monospace I guess. Why not. So we use TeX to create PDF files? 3. First Example ---------------- So we started documenting the afaf project, or more kind of brainstorming, and we have our first use case. lp This default command should just look in doc/ for the index.txt file, and create a corresponding index.html file, with URLs also being converted. #! /usr/bin/env ruby2 def process_text_file( input_filename ) file = File.open( input_filename, "r" ) content = file.read() file.close # Extract named scripts. lines = content.split( /\n/ ) script_filename = nil script = "" mode = nil f_or_t = nil lines.each do |line| if (f_or_t == nil) && (line =~ /<<\s*EO(F|T)\s*$/) mode = "w" if line =~ />>/ mode = "a" end script_filename = line.sub( /^\s*cat\s+>>?\s+/, '' ) script_filename.sub!( /\s.*$/, '' ) script = "" f_or_t = line.sub( /^.*EO([FT])\s*$/, "\\1" ) elsif f_or_t && script_filename && (line =~ /^\s*EO#{f_or_t}\s*$/) file = File.open( script_filename, mode ) file.write( script ) file.close system( "chmod 700 #{script_filename}" ) puts script_filename f_or_t = nil script_filename = nil next elsif script_filename script << line script << "\n" end end output = "" output << "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">" output << "<html>\n" output << "<head>\n" output << "<title>lp: #{File.realdirpath( '.' ).sub( /^.*\//, '' )}</title>\n" output << "</head>\n" output << "<body>\n" output << "<tt>\n" content.gsub!( /&/m, "&" ) content.gsub!( /</m, "<" ) content.gsub!( />/m, ">" ) content.gsub!( /\n/m, " \n" ) content.gsub!( / \n \n/m, " \n  \n" ) # Handle links, but treat local files just relative to the doc directory if necessary, # without the preceeding 'file://'. content.gsub!( /(^|\s)(file):\/\/([^\s<]+)(\s|<)/m, "\\1<a href=\"\\3\">\\2://\\3</a>\\4" ) content.gsub!( /(^|\s)(https?):\/\/([^\s<]+)(\s|<)/m, "\\1<a href=\"\\2://\\3\">\\2://\\3</a>\\4" ) # Repeated spaces have to be respected absolutely. content.gsub!( / /m, '  ' ) output << content output << "</tt>\n" output << "</body>\n" output << "</html>\n" output_filename = input_filename.sub( /\.txt$/, '.html' ) file = File.open( output_filename, "w" ) file.write( output ) file.close system( "chmod 600 '#{output_filename}'" ) puts output_filename system "$HOME/machine/src/rb/lp/bin/add_toc.rb" end files = [ "./doc/index.txt" ] if ARGV.length > 0 files = ARGV end files.each do |input_filename| process_text_file( input_filename ) end exit( 0 ) But this code inside this text document must also be extracted. So for this we have some bootstrap code outside of this text document. Of course, we must later also incorporate this separate program into this document: ./bin/bootstrap.rb The text parser should recognize, that this line above has a corresponding local file, which is a program. So it should include a link, instead of the whole file, maybe. Or extract the main comment to describe this program. 4. First Release ---------------- The first release (commit) had the following statistic. Lines of code: 36 28.57% Lines of text: 90 71.43% Total 126 100.00% 5. Statistic ------------ For the statistic we use the following code: cat > ./bin/statistic.sh <<EOF #! /bin/bash TEXTLINES=$(cat doc/index.txt | grep -E -v '^\s*$' | wc -l) CODELINES=$(cat bin/lp.rb bin/bootstrap.rb | grep -E -v '^\s*$' | grep -E -v '^#' | wc -l) EOFLINES=0 CODELINES=$(expr $CODELINES '+' $EOFLINES) TEXTLINES=$(expr $TEXTLINES '-' $CODELINES) TOTALLINES=$(expr $TEXTLINES '+' $CODELINES) CODEPERCENTAGE=$(ruby2 -e "puts ${CODELINES}.0 / ${TOTALLINES}.0 * 100.0") TEXTPERCENTAGE=$(ruby2 -e "puts ${TEXTLINES}.0 / ${TOTALLINES}.0 * 100.0") printf "Lines of code: %8d %6.2f\n" ${CODELINES} ${CODEPERCENTAGE} printf "Lines of text: %8d %6.2f\n" ${TEXTLINES} ${TEXTPERCENTAGE} printf "Total: %8d 100.00\n" ${TOTALLINES} EOF 6. Extract Named Files ---------------------- As with the statistic.sh script example, we now need to create named script or program files, in addition to the default project script, which is named as a Ruby file in the bin directory, with the same name as the project directory, e.g. ./bin/afaf.rb. OK, we added that to the main script above searching lines for EOT or EOF. 7. Evolving Scripts ------------------- Now, in order to preserve the development of a program and make it easier to understand and follow its logic, how can we in later steps adapt such scripts? One idea is to add place holders and later insert new code at these insertion points. Or we just add simple lines like 'source another_script.sh' into the code and separate the logic that way, accompanying with another section of text. For Ruby use 'load "my/file.rb"' to add extra code later on etc. 8. Testing ---------- First is the resulting HTML code compliant? Use 'tidy' for that. And use the 'check' rule in the Makefile. Yes, we didn't mention the make file, but we use one to invoke 'bootstrap.rb' and create the 'lp.rb' file as well as the HTML page 'index.html'. So now we add the test code into the test script, which will be invoked from the 'check' rule in the Makefile (not included in this document). cat > ./bin/test_lp.sh <<EOF #! /bin/bash # Do not edit this file directly, it is auto generated by lp. set -e grep -E "\\.<a " test/url/index.html && { echo "ERROR: wrong link found."; exit 1; } type tidy || { echo "ERROR: tidy command is not available."; exit 2; } tidy ./doc/index.html 2>&1 | grep "No warnings or errors were found." echo "SUCCESS: $0 - $?." EOF 9. Adding a Table of Contents ----------------------------- Before the first sub headline, and only for HTML, we want to add automatically a table of contents. For this we read the text file, and parse for sub headers. These are defined as: 1. An empty line. 2. A line of text. 3. A line of '-' signs of the same length as the line above. 4. A final empty line. And we number the chapters through. We work directly on the HTML output, as the text file will not be changed anyway. For anchors of this headings, we can just number them through. cat > ./bin/add_toc.rb <<EOT #! /usr/bin/env ruby2 filename = "./doc/index.html" if File.exists?( filename ) file = File.open( filename, "r" ) content = file.read() file.close else printf "INFO: file '#{filename}' does not exist, which is fine, so no TOC" printf " needs to be added to any file.\n Exiting with status: 0\n" exit 0 end lines = content.split( /\n/ ) status = nil headline = nil toc = [] indexes = [] lines.each_with_index do |line, index| if status == nil if line =~ /^  $/ status = "empty line" #puts "#{index}: #{status}" next end elsif status == "empty line" if line =~ /^  $/ next elsif line =~ /^\s*-+\s* $/ status = nil next else headline = line.sub( /^\s+/, '' ) headline.sub!( /\s* $/, '' ) #puts "#{index}: #{headline}" status = headline.length next end elsif status == "dashed line" if line =~ /^  $/ # We should compare the length of the dashed line. But hey, we call it # Bingo anyway. toc << [ headline, index ] status = "empty line" #puts "Bingo: #{toc.inspect}" next end else if line =~ /^\s*-+\s* $/ status = "dashed line" #puts status next elsif line =~ /^  $/ status = "empty line" #puts "#{index}: #{status}" next else status = nil next end end end if toc.length > 0 toc_content = "" toc_content << "  \n" toc_content << "Table of Contents \n" toc_content << "----------------- \n" toc_content << "</tt><ol>\n" toc.each_with_index do |row, index| toc_content << "<li><a href=\"#headline#{index + 1}\">#{row[ 0 ]}</a></li>\n" end toc_content << "</ol><tt>\n" # Insert ToC. output = "" headline_index = 0 lines.each_with_index do |line, index| if index == toc[ 0 ][ 1 ] - 3 output << toc_content end if (headline_index < toc.length) && (index == toc[ headline_index ][ 1 ] - 2) output << "<a name=\"headline#{headline_index + 1}\" />#{headline_index + 1}. " end if (headline_index < toc.length) && (index == toc[ headline_index ][ 1 ] - 1) output << '-' * ((headline_index + 1).to_s.length + 2) headline_index += 1 end output << line output << "\n" end file = File.open( "./doc/index.html", "w" ) file.write( output ) file.close puts "./doc/index.html" end EOT Now we need to make this script to be run everytime the HTML file is generated. 10. Todo Items -------------- . Space between the end of the TOC and the next header is in practice two lines. Reduce this to a single line (2024-02-25). 11. Special Case Without index.txt ---------------------------------- So moving on to use lp with projects that have already a default doc/index.html file, I don't want to move that around to make space for a new index.txt file. So in the text file there should simply be no #! .. exit section and then the index.html version does not need to be (re)created at all. Does it work like this already? No. This does not work, because a) the existing index.html gets overwritten, and b) there is a complain if doc/index.txt is missing. How about if doc.index.html exists and index.txt is missing, then we skip the creation process? Ah, the solution exists already, lp doc/example.txt creates doc/example.html and now we are fine. 12. CSS ------- So one of our 'lp' principles is that the original text format can be used unaltered without any espace codes, tags, or macro codes. But how can we alter the formatting. Of course, with CSS, we like that definition or not, why not use it, as we can "configure" our text output that way with a separate standard. So what we will do, if there exists a 'styles.css' file in the same location as the processed text file, we will add a link to the styles.css file inside the produced HTML file. cat > ./lib/css.rb <<EOT # Do not edit, file is generated. $: << File.dirname( __FILE__ ) require 'lplib' def insert_css_link( html_filename ) lp_dir = File.dirname( html_filename ) css_filename = lp_dir + "/" + "styles.css" if File.exists?( css_filename ) css_link = " <link rel=stylesheet type=\"text/css\" href=\"styles.css\" title=\"css\">" html = Lplib.readfile( html_filename ) html.sub!( /<\/head>/m, "#{css_link}\n</head>" ) Lplib.writefile( html_filename, html ) puts "Inserted CSS link into file: #{html_filename}" else puts "No CSS style sheet added." end end insert_css_link( ARGV[ 0 ] ) EOT cat > ./lib/lplib.rb <<EOT # Do not edit, file is generated. class Lplib def self.readfile( filename ) file = File.open( filename, "r" ) ret_val = file.read file.close return ret_val end def self.writefile( filename, content ) file = File.open( filename, "w" ) file.write( content ) file.close end end EOT 13. Inlining Images ------------------- When a single line consists of no spaces and the whole line represents an existing image file, the file will be inlined into the HTML output (with a src tag). We can therefore keep the processing separate from the rest by just post processing the resulting HTML file. cat > ./bin/inline_images.rb <<EOT #! /usr/bin/env ruby2 # Don't alter this file, it gets auto generated by lp! $: << File.dirname( __FILE__ ) + "/lib" require 'lplib' html_filename = ARGV[ 0 ] if html_filename == nil puts "ERROR: no HTML input file provided!" exit 2 end if File.exists?( html_filename ) == false puts "ERROR: given HTML file name does not exist!" exit 2 end #puts "inlining" html = Lplib.readfile( html_filename ) #puts html.length changed = false while true regexp = /^[^\s<]+\.(jpg|png|webp) \n/m index = html.index( regexp ) if index == nil break end #puts index output = html[ 0 ... index ] image_filename = html[ index .. -1 ].sub( / \n.*\z/m, '' ) output << "<img src=\"#{image_filename}\" /> \n" html = html[ index .. -1 ] html.sub!( regexp, '' ) output << html html = output changed = true end if changed bak_filename = html_filename + ".bak" File.rename( html_filename, bak_filename ) Lplib.writefile( html_filename, html ) puts html_filename else puts "Info: HTML file has no images, no need to save file again: #{html_filename}" end EOT 14. Slides ---------- As with the table of contents, we need to split the whole input text into chapters, which in this case are separate pages. Unlike there, this time we don't use the HTML for this but the original input text. cat > ./bin/slides.rb <<EOT #! /usr/bin/env ruby2 txt_filename = ARGV[ 0 ] # Returns the content, lines, and pages with headline, index start, index end. def split_pages( txt_filename_param ) #puts "split_pages.enter" content = nil if File.exists?( txt_filename_param ) content = Lplib.readfile( txt_filename_param ) else printf "ERROR: file '#{txt_filename_param}' does not exist." exit 2 end # A new page is defined as beginning of text or empty line, followed by a head # line, followed by a '=' or '-' line with the same length as the headline, # followed by the end of the text or an empty line. lines = content.split( /\n/ ) status = "empty line" headline = nil # Contains an array for each page which contains the headline and the index of # the headline in the lines array and the last text line index. pages = [] lines.each_with_index do |line, index| #puts "line[#{status}]=#{line}" if status == "empty line" if line =~ /^\s*$/ next elsif line =~ /^\s*(-+|=+)\s*$/ status = "some text" next else headline = line.sub( /^\s+/, '' ) headline.sub!( /\s*$/, '' ) status = headline.length next end elsif status == "some text" if line =~ /^\s*$/ status = "empty line" puts "empty line" next end elsif status == "dashed line" if line =~ /^\s*$/ # We should compare the length of the dashed line. But hey, we call it # Bingo anyway. if pages.length > 0 pages[ -1 ] << (index - 4) end pages << [ headline, index - 2 ] status = "empty line" next end elsif status.class == Integer # Previously we got a headline. if line =~ /^\s*(-+|=+)\s*$/ status = "dashed line" next elsif line =~ /^\s*$/ status = "empty line" next else # So it wasn't a headline, just the beginning of some text. status = "some text" next end end end if pages.length > 0 pages[ -1 ] << lines.length - 1 if status == "empty line" pages[ -1 ][ -1 ] = pages[ -1 ][ -1 ] - 1 end if status == "dashed line" pages[ -1 ][ -1 ] = pages[ -1 ][ -1 ] - 3 end end #puts "status=#{status}" if status == "dashed line" pages << [ headline, lines.length - 2, lines.length - 1 ] end return content, lines, pages end $: << 'lib' require 'lplib' content, lines, pages = split_pages( txt_filename ) puts pages.inspect $: << "#{ENV[ 'HOME' ]}/machine/src/rb/stacky/bin" $: << "#{ENV[ 'HOME' ]}/machine/src/rb/stacky/lib" require 'stacky' title = pages[ 0 ][ 0 ] push lines[ pages[ 0 ][ 1 ] .. pages[ 0 ][ 2 ] ] stacky "[ -1 slice: ] right :" stacky "3 right .s" stacky "first_page ! .s" stacky "[" push ".AUTHOR \"lp slides\"" push ".TITLE \"#{title}\"" #push ".SUBTITLE \"2025-02-22\"" push ".PDF_TITLE \"\\*[\$TITLE]\"" push '.\"' #push ".DOCTYPE SLIDES ASPECT 16:9" push ".DOCTYPE SLIDES A4" #push ".PAPERMEDIA A4 landscape" push ".DOCTYPE SLIDES" push ".START" push ".PP" push "" push ".PP" push "" push ".PP" push "" push ".PP" push "" push ".PP" push ".HEADING 2 \"#{title}\"" push ".PP" push "" stacky "first_page @" local_lines = pop local_lines.each do |line| push ".PP" push line end # The remaining pages. normal_pages = pages[ 1..-1 ] while normal_pages.length > 0 headline = normal_pages[ 0 ][ 0 ] push ".NEWPAGE" push ".PP" push ".HEADING 3 \"#{headline}\"" push ".PP" normal_lines = lines[ normal_pages[ 0 ][ 1 ] .. normal_pages[ 0 ][ 2 ] ] 3.times { normal_lines.delete_at( 0 ) } normal_lines.each do |line| push ".PP" push line end normal_pages.delete_at( 0 ) end stacky "] lf join" mom_filename = txt_filename.sub( /\.txt$/, '.mom' ) pdf_filename = txt_filename.sub( /\.txt$/, '.pdf' ) puts mom_filename =begin .PP .PP Austausch mit Josef Nguyen .PP .PP Chr. Clemens Lahme .PP 2025-02-22 .NEWPAGE .B TechInvest Tech Stack - Tools .PP Open-Source =end #push title push mom_filename stacky ".s writefile errorexit .s" stacky "'pdfmom ' #{mom_filename} << ' > ' #{pdf_filename} << << system errorexit drop" puts "SUCCESS: #{__FILE__} - 0." EOT 15. Installation ---------------- OK, finally we handle the installation process. Normally this is the first step, together with the requirements. But as this documentation document got written while developing the program, this section ended at the end (for now). Requirements '''''''''''' A Ruby version 2 must be in the environment path and named ruby2. cat > ./configure <<EOT #! /bin/bash # Do not edit this file, it gets generated automatically by lp. echo -e "\nChecking requirements for lp...\n" # Obviously bash is installed or this script would not start at all. type ruby2 || { echo "ERROR: ruby2 is missing."; exit 2; } type make || { echo "ERROR: make is missing."; exit 2; } type tidy || { echo "INFO: tidy command for checking HTML is missing, but it is not essential."; exit 2; } echo -e "\nAll requirements for lp are fullfilled." echo -e "\nIn order to build the project type: make\n" EOT So the user has to run the classical: ./configure make sudo make install Afterwards make sure the 'lp' command is in the environment PATH included: export PATH=$PATH:/usr/local/lp/bin 16. Download ------------ And yeah, where to get it? git clone http://techinvest.li/git/lp.git lp cd lp And then you do the above installation. 17. Usage --------- Here is an example cat > ./test/example.txt <<EOT Example Program =============== This is an example to demonstrate the usage of the lp program. cat > test/example_script.sh <<EOF #! /bin/bash echo "hello, world!" EOF EOT And then to use lp on this file: lp test/example.txt And to execute the script created by that lp document: ./test/example_script.sh