Re: <PRE tab-width=4>

Paul Grosso (pbg@texcel.no)
Fri, 27 Jan 95 14:06:06 EST

> Date: Fri, 27 Jan 1995 12:56:45 -0400
> From: kmc@specialform.com (Keith M. Corbett)
>
> I wonder, how do the big integrated SGML systems handle formatting of text
> with embedded tab characters? This must be a common problem with legacy
> data. How do applications like CALS cope? Are indentation and alignment
> covered in the spec?

Tab characters are not a concept in many typesetting systems. TeX,
for example, doesn't have that concept (it *does* have a concept
of alignment, and it does have several ways of doing tables, and
it even has something called \settabs, but it you look at it, it's
not really the same as tabs, and it certainly isn't tab characters).

I can't speak for all SGML systems, but most of the ones with which
I'm familiar don't do anything special with tab characters--they
usually treat them as a space character. Most SGML *composition*
systems, like most typesetting systems in general, treat multiple
spaces as a single word-separating space and would treat a tab
character in the input stream as just another space character to
be merged with other consecutive spaces into a single word separating
space. After all, what composition systems are usually expected
to do is "H&J" and "Justification" in the composition sense generally
only makes sense when the composition engine determines the interword
spacing, so multiple spaces and tab characters have no place.

In a PRE (aka verbatim or "spacing as is") region, the semantics
are usually different, and line breaks and multiple spaces have important
meaning and are usually maintained. However, tab characters are still
rarely used in this case. The CALS formatting application (the Output
Specification or OS) says nothing directly about tab characters, and I know
that they are generally treated as space characters by the OS-based CALS
composition systems of which I am aware. The latest version of the OS
has a section on "significant record ends" where it discusses how record
ends and multiple consecutive spaces are only preserved in "as is"
sections, and mentions how such sections are often most appropriately
set in a monospaced font, but even here tab characters are conspicuous
by their absence.

paul

Paul Grosso
VP Research Chief Technical Officer Chair
ArborText, Inc. SGML Open CALS OS/FOSI/DSSSL committee

Email: paul@arbortext.com
or pbg@texcel.no