tblcvt -- A troffcvt Preprocessor

Paul DuBois
dubois@primate.wisc.edu

Wisconsin Regional Primate Research Center
Revision date: 20 May 1997

ABSTRACT
tblcvt reads troff input and converts the tbl-related parts to a format that troffcvt can understand more easily than raw tbl output.

Table of Contents


Introduction


This document describes tblcvt ("tbl convert"), a program that assists the process of using troffcvt to convert troff documents into other formats. It's assumed here that you're familiar with tbl. If you don't have the standard tbl documentation (Tbl - A Program to Format Tables, by M. E. Lesk), check the archive site from which you obtained the troffcvt distribution.

tblcvt exists because tables written in the tbl input language present a problem for troffcvt. troffcvt understands only the troff language and knows nothing of the tbl language, so input files containing tables need to be run through some sort of preprocessor before being given to troffcvt. In theory, you could run your troff files through tbl (since tbl generates output written in troff), and feed the result to troffcvt for processing. In practice, tbl output is generally arcane and incomprehensible, and troffcvt doesn't do a very good job with it. The purpose of tblcvt is to convert the parts of troff input files that are intended for tbl into something that's easier for troffcvt to understand. This makes it more likely that troffcvt will generate output that its postprocessors will be able to put back together into something that looks like a table in the target format. Not every table will look great, but any tables in this document are simple enough that they should appear reasonably good if the document is formatted with troff2html, troff2rtf, or unroff.

tblcvt is intended as a drop-in replacement for tbl. Suppose you'd normally format a document using a command like this:

   % tbl file ... | troff [options]
The analogous command using tblcvt and troffcvt looks something like this:
   % tblcvt file ... | troffcvt [options] | postprocessor
Or, if you use one of the front ends like troff2html that invoke troffcvt and the appropriate postprocessor for you, the command might look like this:
   % tblcvt file ... | troff2html [options]
If it seems that troffcvt or a front end are not reading the output from tblcvt, specify - after the option list to explicitly tell them to read the standard input after processing their other options:
   % tblcvt file ... | troffcvt [options] - | postprocessor
   % tblcvt file ... | troff2html [options] -

tblcvt Output Format


tblcvt ignores its input except for those parts between corresponding pairs of .TS (table start) and .TE (table end) requests. For each table, tblcvt digests its specification, figures out the table structure, and produces troff output that indicates the structure using a special set of requests. The output format has the property that it explicitly indicates the beginning and end of each table, each row within a table, and each cell within a row. The general form of table information written by tblcvt looks like this:

   .T*table*begin [table options]
   .T*column*info [column 1 options]
   ...options for remaining columns...
   .T*row*begin
   .T*cell*info [cell layout options]
   ...options for remaining cells in row...
   .T*cell*begin [cell formatting options]
   ...cell contents...
   .T*cell*end
   ...remaining cells in row...
   .T*row*end
   ...remaining rows in table...
   .T*table*end
Shortcut requests are used in certain circumstances. If a cell is empty, tblcvt writes the single request .T*empty*cell rather than .T*cell*begin, .T*cell*end, and the cell data between them. Similarly, if a cell of the table matrix is part of the area spanned by an earlier cell, tblcvt writes .T*spanned*cell. If an entire row consists of a table-width line, tblcvt writes the single request .T*row*line rather than .T*row*begin, .T*row*line, and the cell information between then.

Note that since tblcvt output uses long request names, you can't use compatibility mode (-C option) with troffcvt or a troffcvt front end.

Table Beginning and Ending Requests


Each table begins with a .T*table*begin request, which has the following form:

   .T*table*begin rows cols header-rows align expand box allbox doublebox
rows and cols are the number of rows and columns in the table. (A row that draws a line is considered a data row.)

For tables that are specified to have a header (using .TS H and .TH), tblcvt writes a non-zero value for the header-rows value. Otherwise header-rows is 0.

align is L or C to indicate the table is left-justified or centered.

expand is y if the table is expanded to the full line width, n otherwise.

The box, allbox, and doublebox values are each y or n, depending on whether or not box, allbox, and doublebox were given in the table specification. (Note that allbox and doublebox both imply box.)

Each table is terminated by a .T*table*end request.

Column Information Requests


Following the .T*table*begin request, tblcvt writes one .T*column*info line for each column of the table, in the format:

   .T*column*info width sep equal
The column number is not specified; .T*column*info lines will appear in consecutive order.

width is the minimum required width of the column. The value is non-zero if any entry in the given column specified a w option. If more than one entry specified w, the last one is used. If width is 0, no entry in the column specified w and the width is determined from the data values in the column.

sep is the column separation value.

The equal value is y if any entry in the column specified the e option, and n otherwise. All columns with an equal value of y should be made the same width.

Row Beginning and Ending Requests


If a table row does not consist of a table-width line, the row begins and ends with .T*row*begin and .T*row*end requests. Information for the individual cells is written between these two requests (see "Cell Information Requests").

If a row consists of a table-width single or double line, the .T*row*begin and .T*row*end requests are not used. Instead, the row is specified completely by a single .T*row*line request, written using one of the following forms:

   .T*row*line 1       Table-width single line
   .T*row*line 2       Table-width double line

Cell Information Requests


Between each pair of .T*row*begin and .T*row*end requests, tblcvt writes out the information for each cell (column) in the row. First a set of .T*cell*info lines is written, one for each cell. These requests provide basic layout parameters. Then the contents of the cells are written. For the usual case, a cell is written using .T*cell*begin and .T*cell*end requests, with the cell data appearing between the requests. Empty, spanned, or line-drawing cells are written using .T*empty*cell, .T*spanned*cell, and .T*cell*line requests.

This means that cells begin with any of .T*cell*begin, .T*empty*cell, .T*spanned*cell, or .T*cell*line, and end with any of .T*cell*end, .T*empty*cell, .T*spanned*cell, or .T*cell*line.

The .T*cell*info request has the following form:

   .T*cell*info type vspan hspan vadjust border
The column number of the cell is not specified; .T*cell*info lines will appear in consecutive order.

type is the cell type:

   L         Left-justified
   R         Right-justified
   C         Centered
   N         Numeric (align to decimal point)
   A         Alphanumeric
vspan and hspan are the number of rows and columns spanned by the cell, including itself. Interpret these values as follows:
If all you want to know is whether or not a cell is spanned, the product of vspan and hspan is zero if and only if the cell is spanned. If you need to know whether spanning is in a particular direction, you need to examine vspan and hspan individually. This is summarized in the following table.


hspan = 0 hspan > 0
vspan = 0 spanned both ways spanned from above
vspan > 0 spanned from left not spanned


vadjust is T if the cell contents should be vertically adjusted from the top, C if the contents should be vertically centered. vadjust is meaningful only for multiple-line cells.

border is the border value. If the value is 0, there is no border. Otherwise, the value is a bitmap with the following fields:

   Bits      Value     Meaning
   0-1       1         Left border, single line
             2         Left border, double line
   2-3       1         Right border, single line
             2         Right border, double line
   4-5       1         Top border, single line
             2         Top border, double line
   6-7       1         Bottom border, single line
             2         Bottom border, double line

The .T*cell*begin request has the following form:

   .T*cell*begin font ptsize vspace
font is the font to use for formatting the cell, 0 if no font was specified.

ptsize is the point size to use for formatting the cell, 0 if no size was specified.

vspace is the vertical spacing to use for formatting the cell, 0 if no spacing was specified.

The .T*cell*end request has no arguments:

   .T*cell*end
If a cell is empty or spanned or draws a line, the .T*cell*begin and .T*cell*end requests are not used. Instead, the cell is specified using one of the following requests:

troffcvt Handling of tblcvt Output


The .T*xxx requests are defined in the default actions file that troffcvt reads when it starts up. The actions for the requests cause troffcvt to perform a relatively simple mapping:

   tblcvt output                 troffcvt output
   .T*table*begin arguments      \table-begin arguments
   .T*table*end                  \table-end
   .T*column*info arguments      \table-column-info arguments
   .T*row*begin                  \table-row-begin
   .T*row*end                    \table-row-end
   .T*cell*info arguments        \table-cell-info arguments
   .T*cell*begin arguments       \table-cell-begin
   .T*cell*end                   \table-cell-end
   .T*row*line argument          \table-row-line argument
   .T*cell*line argument         \table-cell-line argument
   .T*spanned*cell               \table-spanned-cell
   .T*empty*cell                 \table-empty-cell
When a request written by tblcvt has arguments, the corresponding control written by troffcvt is written with arguments that are similar to, but not necessarily exactly the same. The primary exception is that the font, ptsize, and vspace arguments to .T*cell*begin are converted directly by the troffcvt actions file into font and size troff directives, then translated into the troffcvt intermediate language. The font and size controls appear in troffcvt output immediately following the \table-cell-begin control. See troffcvt Output Format and PostProcessor Writing for the exact format of the \table- controls.

In addition to the .T*xxx request names used by tblcvt, troffcvt uses the register names T*cell*ft, T*cell*ps, and T*cell*vs for internal purposes.

Calculating Spans


Table specifications may indicate that a table element spans multiple rows or columns, or both. However, not all spanning specifications are legal, and tblcvt tries to catch those that are malformed. The spanning constraints enforced by tblcvt are:

The smallest illegal table specifications that include spans are shown below; each of them violates one of the first two spanning constraints:
   .TS                 .TS
   s .                 ^ .
   data                data
   .TE                 .TE
Assuming the first two constraints are satisfied, the smallest illegal table specifications that include spans are shown below (l is used here, but any non-spanning column type may be substituted):
   .TS                 .TS
   l l                 l s
   ^ s .               l ^ .
   data                data
   .TE                 .TE
The first table is illegal by the following reasoning. The cells in the first column form a single vertically-spanned element. The second column could be part of that element if both cells spanned to the left, since the resulting spanned area would be rectangular. However, since only one of the cells spans to the left, the spanned area is L-shaped, which is illegal. The second table is illegal by similar reasoning. The top two cells form a single element. The bottom two cells could be part of that element if they both spanned upward, but only one of them does.

tblcvt uses the strategy outlined below to determine the extent of spanned elements and to discover non-rectangularies in cell spanning. The strategy works by operating on a matrix with one column for each column specified in the format section of the table specification, and one row for each row of table data given in the data section of the specification. Working from left to right and top to bottom, each cell of the matrix is visited and the following checks are applied:

Some tables are examined below to illustrate the strategy just described.

Example 1: The table shown below is illegal.

   .TS
   l s s l
   ^ s s s .
   data
   .TE
Beginning at the upper left, we see that the vertical and horizontal spans are 2 and 3. The remaining cells in this 2 x 3 block are the second and third cells in the second row. They both span into the first cell of the second row, so they are part of the span block. Therefore, the upper 2 x 3 block is okay. The next unvisited cell is the fourth cell in the first row. This cell is a standalone cell, so it's okay. The last unvisited cell is the fourth cell of the second row. This cell is a spanned cell, but it can't span into the block to the left without forming a non-rectangular block. The table specification is bad.

Example 2: Here's a table that appears at first glance as though it may be illegal. Is it?

   .TS
   l s s s
   ^ s ^ ^
   ^ ^ s s .
   data
   .TE
Beginning at the upper left, we see that the vertical and horizontal spans are 3 and 4. This means 12 cells should be in the span block. We know that the three s cells to the right of the corner cell and the two ^ cells below the corner cell are part of the block, so the next step is to examine the remaining 2 x 3 block at the lower right. In the second row of the block, the second cell spans left into the first column (and is thus part of the span block), and the third and fourth cells span up into the first row (and are thus part of the span block). In the third row, the second cell spans up into the second row (and is thus part of the span block since that row has already been determined to be part of the block), and the third and fourth cells span into the third cell (which, since that cell has just been determined to be part of the block, makes the last two cells part of the block as well).

Therefore, in spite of its unusual specification, the table is legal. It consists of a single 3 x 4 spanned entry.

Example 3: Span calculations are performed with separate matrices for vertical and horizontal spans that initially assume all spans are 1. Suppose we have a table specification that looks like this:

   .TS
   l s l
   l s l
   ^ s l .
   a1   a2
   b1   b2
   c
   d
   .TE
There are three format columns. There are three format rows but four data rows, so the last format line is used for the third and fourth data rows. The vertical and horizontal span matrices are 4 x 3, and start out like this:
   1  1  1       1  1  1
   1  1  1       1  1  1
   1  1  1       1  1  1
   1  1  1       1  1  1
After calculating spans, the matrices end up like this:
   1  1  1       2  0  1
   3  3  1       2  0  1
   0  0  1       2  0  1
   0  0  1       2  0  1