Re: non-ascii markup? [was: NAME as ID ]

James Clark (jjc@jclark.com)
Thu, 3 Aug 95 06:13:55 EDT

> Date: Wed, 2 Aug 95 18:51:17 EDT
> From: "Daniel W. Connolly" <connolly@beach.w3.org>
>
> In message <9508012128.AA22876@mailer.oclc.org>, Ian Graham writes:
> >
> >Yes, this would be a Good Thing in terms of current and likely
> >future multilingual use. It is often convenient to give IDs
> >a semantically meaningful name, in the appropriate language.
> >This means that a linkName should allow either entity names or
> >non-ascii position characters (or both), neither of which are
> >possible if linkNames are NAMEs.
>
> Hang on... you're talking about non-western writing systems inside
> attribute values? I hope not. I don't expect the _syntax characer
> set_, i.e. the coded character set used for markup (including
> attribute value literals) of HTML ever to be anything other than ISO
> 646 IRV (aka 7 bit ASCII).

There's no such thing in SGML as a _syntax character set_. There's a
_syntax-reference character set_, but this is not the character set
used for markup. The syntax-reference character set is the character
set with respect to which the concrete syntax is described. The
character set used for markup is the _document character set_. The
only way in which the syntax-reference character set constrains the
concrete syntax is that every significant character (that is every
markup character or minimum data character) of the concrete syntax
must be in the syntax-reference character set. However attribute
value literals for attributes whose declared value is CDATA can
contain any SGML character, including those that are not significant
SGML characters and so not necessarily in the syntax reference
character set.

James Clark
jjc@jclark.com