SGML newline processing
Michael Leventhal <mleventh@us.oracle.com>
Message-id: <9301082035.AA08237@hqsun4.us.oracle.com>
Date: Fri, 8 Jan 93 12:35:54 PST
From: Michael Leventhal <mleventh@us.oracle.com>
To: connolly@pixel.convex.com
Subject: SGML newline processing
Cc: www-talk@nxoc01.cern.ch
>>>>From what I can tell, a newline is ignored by the SGML parser
>>>if it's right after a start tag or right before an end tag.
>>
>>I haven't been following the discussion closely enough to know
>>if a suggestion for dealing with the problem will be helpful,
>>but...
>>
>>I use the SHORTREF feature to implicitly recognize an EMPTY
>><newline> tag wherever I want newlines after a start tag or
>>right before an end tag to be preserved. The parser will
>>generate the implicit tags which my processing engine then
>>converts back to actual newlines for output display.
>>
>>Although this seems like a pain I believe the behavior of
>>the parser is logically correct and the only way to be consistent.
>
>Your argument is sound, but we're trying to design a format
>that is defined completely in terms of SGML, but parsed by
>homebrew code.
>
>So the SGML declaration for HTML turns the SHORTTAG feature
>off, saving us some parsing hassles.
>
>If you're using a full-featured SGML parser, you can usually
>tweak the DTD to make the stuff parse how you like through
>shortrefs and the like. But we're using a bare-bones
>SGML parser, so we're just trying to get by without
>conflicting with the standard.
>
>Dan
I'm at home with the flu, without my copy of ISO 8879 and I
haven't become one who can quote clause and sub-clause from
memory (yet :-)), but ...
SHORTTAG is an optional feature, but SHORTREF is not, since
it is required in the SGML declaration. I think, according
to the standard, a system which does not support SHORTREF
is not compliant and therefore not even minimum SGML.
My solution only requires SHORTREF. I code:
<!ELEMENT newline - o EMPTY>
<!ENTITY nltag STARTTAG "newline">
<!SHORTREF nlmap "&#RS;" nltag>
<!USEMAP nlmap (verbatim)>
The use of OMITTAG in the newline element is not
necessary. This code causes the parser to recognize
record starts as newline tas within verbatim tags.
My processor converts the newline tags back to record
starts.
Michael Leventhal
Oracle Corporation
mleventh@us.oracle.com