Last modified: Sun Dec 19 23:44:22 1999, see what's new.

WebCompile -- An HTML macro-processor

[Note: The home-grown macro processor I use to compile this page does not yet understand pages loaded with HTML examples. Thus, I have preserved the original plain text documentation for the web compiler in case something on this page doesn't look quite right.]

Overview:

Webcompile is a perl script which processes HTML-like source files and generates real HTML files. The script enables the user to define and use macros which look (for all practical purposes) like HTML tags in his source files, thus allowing commonly-used HTML sequences to be 'packaged' and re-used in a number of places.

The most basic and obvious use is to package the miscellaneous header and footer overhead which is applied to every page on a web site. The advantage of doing this is that changes to the standard header and footer information need only be made once in order to take effect on all pages that use that standard macro. Another advantage is that it reduces the clutter in the source file for each page, thus reducing the chance of confusion and error.

For example, assume we define a macro, thus:

The 'define' and 'contents' macros are two of the built-in macros in the script. What this sequence does is to define a new macro with the name 'document' and to instruct the script to insert the HTML tags as shown above into the source stream any time the <document> tag is used. Now, this would not be too useful if we had no way to customize the innards of the given HTML sequence. In order to pass the body of the page itself to the macro, we use a tag-pair like so:

<document> This is meant to be a really cool web page </document>

Whatever is found between the <document> and the </document> tags is called the 'contents' of the macro. When the macro definition is substituted for this tag pair, it is first processed in the context of the main source stream and the <contents> tag is automatically defined as all the stuff between the starting and ending tags of the macro instance.

Thus, in our example, the following would result from processing our source stream:

<html> <head><title>My Cool Page</title></head> <body> This is meant to be a really cool web page </body> </html>

Of course, there are two kinds of tags in HTML and, likewise, there are two kinds of macros understood by the script. The first are container-like tags such as <h1> which accept contents and an end-tag:

<h1>This is a heading</h1>

The <document> macro in the example above is a container-like macro. It has a <contents> tag as part of it's definition (this is the only difference). The other kind of tag does not accept contents. The <hr> tag in HTML is an example of a stand-alone tag. You can define stand-alone macros in the same way as you define container-like macros:

<define "mypicture"> <img SRC="photo.jpg"> </define> <document> Take a look at my picture: <mypicture> </document>

The script isn't too strict in it's processing (both for simplicity sake and to make the macro language as flexible as possible). Here are the rules:

If the source stream contains an end tag matching a beginning tag (for example: <tag> ... </tag>), then the stuff found between the two tags will become the 'contents' of the tag in question.
If <contents> is used in the macro, the contents of the tag will be inserted. If no contents exist (that is, no end tag was found) then the empty macro <undef> will be inserted. By default, this will result in nothing (literally) being inserted where the <contents> tag was found (see also below).
If an end tag is found in the source stream (thus making the tag into a container-like tag) but the <contents> tag is not used in the definition of the macro, the contents will be silently discarded. This allows the user to define tags that are meant to swallow their contents or perform some other kind of processing based on the contents. (see also below).

If you want to be warned whenever the <contents> tag appears while processing a macro that has no apparent contents, just define the <undef> tag to have some message, thus:

This will insert the error comment into the source stream where the missing contents would have been. To define a <comment> tag that allows blocks of HTML code to be commented out en-masse, just do the following:

You can also remove your own stand-alone comments from the output stream, thus:

Include files:

Of course, if the same macros are used across several of your web pages, there needs to be a mechanism for defining these in a common place and re-using them over and over. For this we use include files. The built-in macro <include> will open the designated file and insert the contents of that file into the source stream at the time of processing (note well that statement).

For example, let's say we put our <document> macro (along with others) in a file called 'stuff.def'. To use these macros in a source file, type:

<include stuff.def> <document> ...things as usual... </document>

Of course, if the include file is truely common across all your web pages, you hardly want to keep typing it on every page. Thus, include files can also be invoked from the command line, thus:

webcompile -i stuff.def source.html

Only <define> and <set> tags are of much use in a file included in this way, as any output-producing source code will not appear in the generated file (that is, unlike including the file in-line, including from the command line only allows the side-effects of the file contents to take place -- the file contents are not directly inserted into the source stream).

Variable substitution:

Another interesting thing about this script is it allows the user, from either the source stream or from a macro, to execute perl code in a clean namespace. There are two built-in macros used to execute perl code. The <set> macro will execute the code contained therein and return nothing (useful for setting variables, printing messages, defining subroutines, etc.). An example of the set command would be:

This sets the perl variable $fred to the value '1' in the user's namespace (that is, even if the script itself has a variable $fred, there will be no collision).

To use the set variable, the <get> macro is used. This tag returns (as text) the results of the execution of the given statement, thus:

The value of fred is <get $fred>

would result in:

The value of fred is 1

One disadvantage of the <get> macro is that it cannot be used inside tags or other macros (the tag-seperation parser is not that smart). To get around this, another construct may be used. Anything enclosed in the character sequence:

&( ... );

is also executed as a perl fragment and the returned result is inserted into the source stream as text (even within a tag). The example above could also be written thus:

The value of fred is &($fred);

This is actually the preferred method for inserting variables into text, as the resulting source file becomes quite easy to read.

The following caveats should be kept in mind:

The <set> and <get> macros do no argument processing so whatever appears after the 'set' or 'get' is sent intact to the perl processor. Quotes (ie: <set $fred=1>) should not be used unless you intend to pass or return a string.
Both <get> and can modify the variable space (ie: 1).
Substitution of is done prior to processing a macro. Thus, this construct may be used to pass perl variables as parameters to macros (see the section on parameterized macros).
There are certain variables pre-defined by the script (see below).

Conditional compilation:

Another incredibly useful feature of this script is conditional compilation. This is the ability to select among two or more sub-sections of the source stream based on certain run-time criteria. Let's say, for example, that you have already defined a variable called '$language' which has the value 'en' for English and 'sw' for Sweedish. The following source could be used in the generation of both the English and Sweedish web pages:

<document> <if $language == 'en'> This is my English page <else> Zeese ees moi Sveedish peig </if> </document>

Like <set> and <get>, the argument of the <if> tag is passed intact to the perl processor. Whatever is returned as a result of evaluating the code is used as a yes/no value (non-zero being 'yes' and zero being 'no') and one of the two streams of tags and text is processed, the results being substituted for the entire <if> ... </if> block.

NOTE: The processing of this block differs from any others. The built-in <else> tag is a stand-alone tag that, by itself, returns nothing. The contents of the <if> extend to the first </if> which is not otherwise already matched to some embedded <if> block. The processing of the <if> scans for an <else> which is also not embedded in an interior <if> block and, if one is present, the <else> macro is used to break the contents of the <if> block into two parts. The <else> is optional -- if it is not present, the result of the <if> block will be nothing (literally) if the evaluation of the expression in the <if> tag results in a zero value.

If you nest <if> blocks, be very careful to close them properly and to get the tags in the right place, as there is no way for the script to detect improperly nested <if> blocks.

Parameterized macros:

Macros can also be passed parameters. Like other HTML tags, the parameters take on the form of 'name=value'. If the 'value' field is something other than a simple name or number, quoting is recommended:

<document tltle="My cool web page" color="red"> ...things as usual... </document>

Nothing special has to be done when defining a tag in order to allow the tag to accept parameters. When the macro contents are processed, the parameters passed can be accessed with the following construct:

&(&getarg("argname"));

This is actually a perl call to a pre-defined subroutine in the user namespace which, in turn, calls a subroutine in the script that parses the parameters of the tag which called the macro in question and returns the value associated with the specified name. While all this sounds confusing, it's really not. Lets use our original example and build up a <document> macro that can accept its title and color from the macro instantiation:

<define name="colorselect"> <if "&(&getarg("color"));" eq "red"><set $color = "#ff0000"></if> <if "&(&getarg("color"));" eq "grn"><set $color = "#00ff00"></if> <if "&(&getarg("color"));" eq "blu"><set $color = "#0000ff"></if> </define> <define name="document"> <html> <head><title>&(&getarg("title"));</title></head> <colorselect color="&(&getarg("color"));"> <body bgColor="&($color);"> <contents> </body> </html> </define> <document title="My cool web page" color="red"> This is meant to be a really cool web page </document>

Notice that the series of <if> blocks transforms the descriptive color name into the HTML equivalent for that color. You could also define color schemes in the same way. When accessing the value of an argument, it is returned sans any quotes it may have had when passed in (in order to be able to use it as text if necessary) so we must re-quote it if we intend to pass it down to a further macro. Note also that a variable is used to pass the real color value to the <body> tag. All variables are global (except for the macro parameters) so special care should be taken not to collide with variables used by other macros.

Another way this could have been written to avoid using variables:

<define name="colorbody"> <if "&(&getarg("color"));" eq "red"> <body bgColor="#ff0000"><contents></body> </if> <if "&(&getarg("color"));" eq "grn"> <body bgColor="#00ff00"><contents></body> </if> <if "&(&getarg("color"));" eq "blu"> <body bgColor="#0000ff"><contents></body> </if> </define> <define name="document"> <html> <head><title>&(&getarg("title"));</title></head> <colorbody color="&(&getarg("color"));"> <contents> </colorbody> </html> </define> <document title="My cool web page" color="red"> This is meant to be a really cool web page </document>

The instructive thing here is to note that the <contents> tag knows how to handle nested macros. That is, the <colorbody> macro is called using the contents of the <document> macro which was, in turn, the text of the actual web page. The result in either case should be:

<html> <head><title>My cool web page</title></head> <body bgColor="#ff0000"> This is meant to be a really cool web page </body> </html>

Order of processing:

This section is not yet finished. But here are some miscellaneous notes that need to be added to this documentation:

Defines are processed in the order in which the compiler encounters them.
Macro definitions (contents of <define>) are processed at the time of use.
Undefined tags are passed to the output as-is.

Invocation and command line arguments:

This section is not yet finished.

Pre-defined variables:

There are certain variables pre-defined in the user space by the script. This is to allow macros to access information about the run in question to which the macros would otherwise not have access. These variables are:

$Date ==> The date of the translation
$Path ==> Path to the target directory
$Name ==> Name of the source file
$Subd ==> Path to the source file w/in the target
$File ==> File name of the target file
$Base ==> Base name of the target file
$Extn ==> Extension of the target file

Suggested improvements:

This section is not yet finished.

Unprocessed blocks
Define a block which twarts macro processing of its contents:
and use for examples in documentation web page:

Press the button on the right to be notified of changes to this page.

This notification is provided as a public service by the URL-minder.

Don't forget to visit my...

[ HOME PAGE | FAVORITE PLACES | PERSONAL RESUME | GUESTBOOK ]

This page is maintained as a public service by Joe Larabell.
Inquiries: http://larabell.org/MailTo.cgi/larabell/getForm