Charsets in .01 spec

Terry Allen (
Thu, 9 Feb 95 19:22:54 EST

Looking at the just-announced .01 version of the spec, which I too
would like to kick out the door, I find some problems with the
section on charsets. Dan, you were too optimistic about this
language. Considering that 2.0 is emphatically Latin 1, even
to the point that no other charset is approbated, let's just
say that 2.0 does Latin 1 and we defer expanding the realm
of allowable charsets until 2.1.

That way we could defenstrate this item and still keep talking.

| INTERNET DRAFT February 8, 1995
| Expires in six months
| HyperText Markup Language Specification - 2.0
| <draft-ietf-html-spec-01.txt>
| 2.4 HTML as an Internet Media Type
| Charset
| The charset parameter (as defined in section 7.1.1 of
| RFC 1521) may be used with the text/html to specify
| the encoding used to represent the HTML document as
| a sequence of bytes. Normally, text/* media types
| specify a default value of US-ASCII for the charset
| parameter. However, for text/html, if the byte stream
| contains data that is not in the 7-bit US-ASCII set, the
| HTML interpreting agent should assume a default charset of
| ISO-8859-1.
| When an HTML document is encoded using US-ASCII,
| the mechanisms of numeric character references (see
| Section 2.16.2) and character entity references (see
| Section 2.16.3) may be used to encode additional characters
| from ISO-8859-1.

This also works in any ISO-8859-n charset, and others. As the
SGML decl is fixed (in 5.1), I see no value in the preceding para.

| Other values for the charset parameter are not defined
| in this specification, but may be specified in future
| levels or versions of HTML.
| It is envisioned that HTML will use the charset parameter
| to allow support for non-Latin characters such as
| Greek, Arabic, Hebrew, Japanese, rather than relying on
| any SGML mechanism for doing so.

This conflicts directly with what follows directly:

| 2.5 Understanding HTML and SGML
| HTML is an application of ISO Standard 8879:1986 -
| Standard Generalized Markup Language (SGML). SGML is a
| system for defining structured document types, and
| markup languages to represent instances of those
| document types. The SGML declaration for HTML is given
| in Section 5.1. It is implicit among HTML user agents.
| If the HTML specification and SGML standard conflict,
| the SGML standard is definitive.

Terry Allen  (   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
monthly column at:

A Davenport Group sponsor. For information on the Davenport Group see or