Re: case sensitivity in tags?

Tim Pierce (twpierce@midway.uchicago.edu)
Wed, 10 May 95 12:10:18 EDT

> > From: connolly@w3.org (Dan Connolly)
> >
> > Paul Grosso writes:
> > > > From: connolly@w3.org (Dan Connolly)
> > > >
> > > > > Is this an SGML problem?
> > > >
> > > > Yes.
> > >
> > > What is the (perceived) problem? That 8879 (the SGML standard) doesn't
> > > cover this (which isn't true) or that what it does say needs explaining?
> >
> > Er... I meant "Yes, this is a problem where the HTML spec inherits
> > from SGML," not "Yes, this shows some problem with the SGML spec."
> >
> > For example, you can experiment via the HTML validation service
> > to see which attribute values are case sensitive and which are not.
>
>
> > From: Tim Pierce <twpierce@midway.uchicago.edu>
> > To: paul@arbortext.com
> >
> > > What is the (perceived) problem? That 8879 (the SGML standard) doesn't
> > > cover this (which isn't true) or that what it does say needs explaining?
> >
> > Apparently the latter. I was the one who first raised the
> > question with Dan. Now, I admit right out front that I'm
> > not conversant with SGML, but since that's likely to be true
> > of a large proportion of the audience of this Internet
> > draft, it strikes me as important to clarify it elsewhere in
> > the document.
> >
> > I asked if it was an "SGML problem" because I didn't know
> > whether SGML would permit individual DTDs to specify such
> > things as the case of the attributes. As you seem to
> > observe, that isn't so.
> >
>
> I'll say a few more words assuming it may help some people. I'll try
> not to be too technical (though it'll be at the expense of precision).
> As far as whether any such discussion is needed in the HTML 2.0 spec,
> I leave that decision to others (my vote is not to bother, but I don't
> feel strongly if consensus is to add something--the problem is that,
> if you are going to add something to the spec, it's going to have to
> be more rigorously and carefully worded than what I've got below, and
> that could start another whole round of discussion).
>
> Things that are tokenized in SGML *may* be case insensitive while
> things that are treated as untokenized strings of data characters
> have their case preserved (i.e., are case sensitive). Names of
> elements and attributes and entities (among other things) are
> tokenized; so are attribute values [technically, "attribute value
> literals"] for attributes whose "type" (declared value) is not CDATA.
>
> Tokenized names are divided into two catagories for the purpose of
> case sensitivity: general names (which includes element and attribute
> names and most other name tokens including any in tokenized attribute
> values other than those that represent entity references) and names of
> entities. The SGML declaration allows the case sensitivity of each of
> these two classes to be indicated via the NAMECASE specification part
> of the concrete syntax. The very common (almost universal in practice)
> assignment is NAMECASE GENERAL YES ENTITY NO which means that general
> names are case insensitive (i.e., all lowercase letters in the name
> will be converted to uppercase by the parser) and entity names are
> case sensitive. This is the setting in the HTML 2.0 SGML declaration.
>
> With NAMECASE GENERAL YES ENTITY NO, all the following are case insensitive:
> element names, attribute names, attribute values for attributes whose type
> is ID, IDREF(S), NAME(S), NMTOKEN(S), NOTATION, NUMBER(S), NUTOKEN(S); whereas
> all of the following are case sensitive: entity names in entity declarations
> and references and attribute values for attributes whose type is ENTITY or
> ENTITIES. Data characters and attribute values for attributes whose type
> is CDATA are always case sensitive.
>
> As a somewhat tangential note of interest, minimum literals such as
> those in Formal Public Identifiers (FPIs) such as "-//IETF//DTD HTML//EN"
> and "-//IETF//DTD HTML 2.0 Level 2//EN" and "ISO 8879-1986//ENTITIES Added
> Latin 1//EN//HTML" are *case sensitive* and FPIs that have the wrong case
> will not match their intended target and therefore will not resolve properly.
> Also note that these minimum literals are normalized by converting record
> ends to spaces, then condensing all space sequences to a single space and
> stripping leading and trailing spaces; however inserting embedded spaces
> or record ends where no space belongs (such as, around any of the //'s)
> will produce a different FPI that will no longer match its intended target.
>
> paul
>
> Paul Grosso
> VP Research, ArborText, Inc.
> and
> Chief Technical Officer, SGML Open
>
> Email: paul@arbortext.com
>
>
>