Re: non-ascii markup? [was: NAME as ID ]

Amanda Walker (amanda@intercon.com)
Thu, 3 Aug 95 12:04:35 EDT

> From: "Daniel W. Connolly" <connolly@beach.w3.org>
>
> Hang on... you're talking about non-western writing systems inside
> attribute values? I hope not. I don't expect the _syntax characer
> set_, i.e. the coded character set used for markup (including
> attribute value literals) of HTML ever to be anything other than ISO
> 646 IRV (aka 7 bit ASCII).

Try convincing a Japanese HTML author of this :).

In doing Kanji support for our WWW browser, I had to add code to handle
two-byte characters in attribute value literals (mainly "ALT" for images,
but occasionally in other places) and comments as well as running text.
Whether or not quoted literals are offically supposed to be in the
syntax character set or the document character set, in pragmatic terms
browsers should expect to handle the document character set. The
evidence shows that people think of literals as text, not markup, which
is not entirely unreasonable.

Amanda Walker
InterCon Systems Corporation