Re: Objections to draft-ietf-html-spec-01.txt

Gavin Nicol (gtn@ebt.com)
Fri, 24 Mar 95 10:35:57 EST

>Since these instructions do not have to be transmitted over the net, I
>don't think they NEED to be spelled out for each and every charset
>before that charset is considered acceptable. The problem is trivial
>for ISO 8559-x charsets, which have ASCII as a proper subset. If I
>receive a properly labeled ISO-2022-JP HTML document, I will know how
>to parse it (hint: the markup is to be found in the ASCII portions,
>not in the JIS X 208 portions).

At the very least, a coherent framework needs to be described. While
many things work today, it is basically because people have hacked in
support, rather than designing it in.

>Your standard SGML tools are not up to the task? This is an
>implementation issue, not an Internet standards issue.

No. SGML requires a full description of the characters in order to be
able to correclty parse a document. That is an Internet standards
issue, as is the above.

>The proper question to ask is: does it work? The fact that there is
>an unambiguous way to parse a properly labeled document, together with
>the fact that it is here today (although the lack of proper labeling
>makes it ackward), tell me that it works. Then it should be in, and
>not in 2.x N months from now. The first W in WWW will not wait.

It works only because of hacks to parsers that render them illegal in
SGML terms.

For the moment, the best way to document current practise is with
something like:

<BLOCKQUOTE>
An HTML parser must accept a stream of characters as input, and assign
them to character classes used for markup recognition
purposes. Currently HTML markup requires only the characters found in
the US-ASCII character set. All ASCII characters with no markup
role, and all non-ASCII characters should be treated as data. Such
categorization should take place after decoding the data stream, or in
other words, not at the characer set encoding level, but rather at the
character set level.

When the above conflicts with the SGML standard, the SGML standard is
to be ignored.
</BLOCKQUOTE>