Recursion and the STP Macro Processor

 

This zipped file contains two Python versions of the macro processor developed in the classic textbook, Software Tools by Brian Kernighan and P.J. Plauger (1976), and in the later edition, Software Tools in Pascal (STP, from 1981). The Pascal text is currently available at the Internet Archive, and the source code has been preserved on Github by Shuo Chen. The macro processor is developed in Chapter 8, which I've made available here as a PDF (2.6 MB).

The textbook code is closely based on the m3 macro processor by Dennis Ritchie. Subsequently, Kernighan and Ritchie joined forces to develop m4, a macro processor tool for UNIX. In 1990, René Seindal released GNU m4, which has been maintained and improved by a variety of developers ever since.

 

Stacks vs. Recursion

My Python versions are macro3.py and rdMacro.py.

macro3.py follows the approach used by Kernighan and Plauger, and their Pascal code is available for comparison in macro3.pas. When a macro call is encountered in the input text, the name and definition are placed on a stack. Any macro arguments are also copied onto the stack, except when an argument contains another macro call which triggers the creation of a new stack frame. The inner macro is evaluated completely, and its translation pushed back into the input text, its frame is popped, and processing resumes on the outer macro. As a result, the outer macro never sees the inner one, only its translation. Of course, the inner macro might call other macros, so the rewriting process is recursive.

The two large changes I made to macro3.py were the reduction of the use of globals and the simplification of the data structures. For the former, this consisted of passing a Python deque containing the input text to the macro processing functions, and having them return the expanded text. In the latter case, I combined the original Pascal data structures into a single stack of frames.

frame = { 'name':token, 'kind':kind,
          'args':[], 'plev':0, 'current':'' }

Each frame holds the parser state for a macro call, and a stack of frames is required to deal with nested macro calls, such as:

foo(bar(x), baz(y))

During macro processing, foo() will need to 'wait' while bar() and baz() are invoked, and its details are stored in a frame in the meantime.

My changes to macro3.py made the recursive nature of the macro processing very evident. For example, macro() is called on the input text at the top-level, and called again to process the expansion of macro arguments. This observation led me to write rdMacro.py, which re-implements the top-level of the processor as a recursive-descent parser. It follows the BNF:

  mtext      ::= { element }

  element    ::=  macro-call
               | ident
               | quoted
               | other-char

  macro-call ::= ident '(' args ')'
  ident      ::= (letter | '_') { letter | digit | '_' }
  args       ::=  [ mtext {',' mtext } ]

  quoted     ::= lquote qbody rquote
  qbody      ::= { quoted | any-char }

  user-macro ::= text containing $0..$9 substitutions

This approach deals with nested calls such as foo(bar(x),baz(y)) by making the parser for mtext recursively call itself to process each macro-call. Instead of having to maintain a stack of frames, the execution of the parent is suspended until the recursive calls return their expanded text.

Although a frame stack isn't needed in rdMacro.py, it does reuse a lot of the other code from macro3.py, including the symbol table for built-in operations, the expression evaluator, and the quotes processing.

 

Why is the STP Code Not Recursive?

Ritchie, Kernighan, and Plauger (and many others involved in the early days of macro processing) are/were first-class programmers, so why didn't they use recursive-descent parsing to implement macro processing rather than stacks?

My guess is that it's due to the origins of the code. Software Tools is implemented in Ratfor, a structured programming flavor of FORTRAN IV, and recursive functions weren't added to that language until Fortran 90 (circa 1991). However, recursion was available in Pascal, so why the continued use of stacks in Software Tools in Pascal?

It's interesting to compare the utilization of recursion in the two editions. I found four procedures that employ recursion in STP – itoc() in chapter 2, finclude() in chapter 3, quicksort() in chapter 4, and amatch() in chapter 5. The first two are just as elegantly implemented using loops in Software Tools, but Kernighan and Plauger really have to jump through hoops to code up quicksort() and amatch() iteratively. amatch() is part of a grep-like utility, and the authors actually remark that recursion handles a lot of the book-keeping that has to be coded explicitly in the looping version.

What about GNU m4?

One of the design aims of GNU m4 was to remove the limitations of many of the older m4 implementations, such as maximum line lengths, macro sizes, or the number of macros. However, it seems that its source code also uses stacks (e.g. see macro.c).

 

Downloads

 

Dr. Andrew Davison
E-mail: ad@coe.psu.ac.th
Back to my home page