Re: Character encoding and entities

Ka-Ping Yee (
Tue, 18 Jul 95 13:31:00 EDT

On Tue, 18 Jul 1995, Daniel W. Connolly wrote:

> Please see:
> ..
> Charset
> The charset parameter (as defined in section 7.1.1 of RFC
> 1521[MIME]) may be given to specify the character encoding
> scheme used to represent the HTML document as a sequence of
> octets. The default value is outside the scope of this specification;
> but for example, the default is `US-ASCII' in the context of
> MIME mail, and `ISO-8859-1' in the context of HTTP.

> Please don't make up markup for character encoding schemes.
> Markup for languages, writing directions, etc. is approporate.
> Markup for character encoding schemes is not.

Japanese text on the World-Wide Web (when served using ISO-2022-JP)
may contain special characters like <, >, and &. Commonly, it appears
that people leave these characters in their text, and then others have
to fix their browsers[1] to interpret markup characters only outside
of JIS text.

However, i've seen the opposite solution with Chinese text, which may
also include <, >, or &. For instance, at [2] these three characters,
when encountered in Hz-encoded text from the GB character set, are
escaped as the entities &lt;, &gt; and &amp; respectively.

>From my experience, the former treatment is more widespread than the
latter. But the latter ensures that there is no chance of documents
breaking parsers, while occasionally these problems occur in the
former case. Does this mean the latter is more correct?


Ping (Ka-Ping Yee): 2B Computer Engineering, University of Waterloo, Canada | 62A Churchill St, Waterloo N2L 2X2, 519 886-3947
CWSF 89, 90, 92; LIYSF 90, 91; Shad Valley 92; DOE 93; IMO 91, 93; ACMIPC 94
:: Skuld :: Tendou Akane :: Belldandy :: Ayukawa Madoka :: Hayakawa Moemi ::
New! <> Yeah, i finally made a home page.