Re: SPaces and Tabs in HTML documents

Damian.Cugley@prg.ox.ac.uk
X400-Received: by mta chx400.switch.ch in /PRMD=switch/ADMD=arcom/C=CH/;
               Relayed; Mon, 14 Jun 1993 17:38:22 +0200
X400-Received: by /PRMD=uk.ac/ADMD= /C=gb/; Relayed;
               Mon, 14 Jun 1993 17:34:47 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 14 Jun 1993 17:34:32 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 14 Jun 1993 17:35:21 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 14 Jun 1993 17:35:17 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 14 Jun 1993 17:35:17 +0200
Date: Mon, 14 Jun 1993 17:35:17 +0200
X400-Originator: Damian.Cugley@prg.oxford.ac.uk
X400-Recipients: www-talk@NXOC01.cern.ch
X400-Mts-Identifier: [/PRMD=UK.AC/ADMD= /C=GB/;<9306141535.AA00686@boothp1.ecs.]
X400-Content-Type: P2-1984 (2)
Content-Identifier: Re: SPaces an...
From: Damian.Cugley@prg.ox.ac.uk
Message-id: <9306141535.AA00686@boothp1.ecs.ox.ac.uk>
To: www-talk@nxoc01.cern.ch
Subject: Re: SPaces and Tabs in HTML documents
> I would like to specify that multiple spaces be interpreted as such.
> Would this be a big problem for anyone?

As a general attitude to spacing, I still think that Knuth got it
right with TeX [TeXbook, p.46]: by default newlines, tabs and spaces
are treated as equivalent, multiple spaces are collapsed, spaces and
tabs at the starts of lines are ignored, and extra space after
punctuation is handled separately.  (TeX is a typesetter, and spacing
punctuation in one-space units would be too coarse.)  Knuth's thinking
was that if you can't see the difference on the screen then it should
not make a difference to the final output.

On the other hand, there are two reasons not to adopt Knuth's
conventions verbatim: first, your hands are likely tied by SGML
compatibility; second, you are working in a fairly clunky character
medium rather than fine typesetting.  If the spacing is done in units
of a normal inter-word space, then indicating a double-wdith space
with two space characters *is* an expedient convention.  (Especially
as HTML has no way of distinguishing an end-of-sentence full stop from
an abbreviation-indicating fill stop.)

That said, being able to rigidly indent the text of a document with
markup can greatly improve its legibility -- with TeX documents I
indent the text one tab stop, leaving the macros that generate
headings and delimit regions of text in column 0.  This makes the
structure of the document easier to follow.  (It also prevents UNIX
sendmail from inserting ">" characters before "From".)  The same
applies to HTML documents, excapt that if the text is indented, only
Mosaic can display them properly.

You could specify that any number of whitespace characters (SP, HT,
LF, CR) may follow the CR or LF marking the end of a line, and they're
all ignored (except in PRE elements, of course).  This would mean that
indentation of text lines is ignored, as well as any blank lines (even
ones with spaces on them).  Doubled spaces after sentences would be
preserved.  This should be easy to implement, and allows people like
me to format their text pretty much as they would like.

Damian