<, >, and & characters in HTML

Dan Connolly <connolly@pixel.convex.com>
Subject: <, >, and & characters in HTML
Date: Thu, 10 Dec 92 11:05:24 CST
From: Dan Connolly <connolly@pixel.convex.com>

I'm trying to write libHTML so that applications
can just deal with C style character strings, and
the library does all the SGML details.

The use of &lt; and &amp; to represent < and & never
seemed to fit cleanly into the SGML view of things.
So I posted to comp.text.sgml.

I think I'm a lot clearer on the matter now. The
&lt; and &gt; entities are meant to be used in typesetting
mathematics, where a less-than symbol is not necessarily
the same thing as a '<' character.

There's a mechanism for referencing characters in the
document character set in such a way that they will
not be treated as markup: numeric character references.

I'd like to get rid of the &lt;, &gt;, and &amp; entities
from the HTML DTD. Granted, there will be a transition
period while providers adjust, but I think it will make
the spec cleaner.

Anyway, here's what the experts had to say...


From: Erik Naggum <SGML@ifi.uio.no>
Date: 10 Dec 1992 07:36:57 +0100
Subject: Re: hiding <, >, and &
[Dan Connolly]
|   There is a lot of need for a routine represents an arbitrary string
|   of characters as SGML data -- a routine that hides <tags> etc. from
|   the parser.

The simplest would be to use character references for the characters
that you need to quote.  See 9.5 Character Reference, [357:10-13].


The way I see it, this is a completely failsafe technique.

|   [It does bring up the question of representing " and ' characters in
|   attribute value literals. Hmm.. another situation I think I'll
|   just avoid.]

What's wrong with "'", '"', "&#34;", and '&#39;'?


Erik Naggum
                            ISO 10744 HyTime
<erik@naggum.no>            ISO  9899 C                 Memento, terrigena
<enag@ifi.uio.no>           ISO 10646 UCS             Memento, vita brevis