Re: Question on attribute values

Joe English (joe@trystero.art.com)
Tue, 25 Oct 94 22:13:20 EDT

kball@kballuw.SJF.Novell.COM (Keith Ball) wrote:

> Here are some statements that I believe are true. If they are not,
> please tell so.
>
> 1) attribute values that are defined as CDATA, such as ALT for
> the IMG element, MUST be quote delimited even if they do NOT contain
> spaces (0x20) or ">".

Yes and no: you don't need quotes if the attribute value contains
only *name characters*. This is true regardless of the attribute's
declared value.

E.g., <img alt=foo> is legal, and means the same as <img alt="foo">.
<a href=http://www.foo.com/> is *not* legal since : and / aren't name
characters; in this case quotes are mandatory.

> 2) attribute values which are enumerated values (a fixed set of
> keywords that are accepted and understood) may have, but do not
> require, quote delimiters. Such as ALIGN="top" or ALIGN=top.

Correct, but because of (1), not because of their
declared value.

> 3) Within the CDATA defined attribute value strings ANY character
> in the character set is valid, except the starting quote delimiter.
> Therefore if the starting quote delimiter is: ', then the string
> may contain a ", but it may not contain a '.

Correct (but see (4)).

> 4) Character entities (such as &Otilde;) and numerical character
> references (such as &38) may occur in an attribute value that is
> quote delimited.

Correct, and they should be expanded.

> 5) Attributes that have the attribute name in parenthesis but do not
> have a VALUE definition, have an implied value definition of the name
> of the attribute if the name is NOT specified with the attribute.

No...

> Therefore, ALL attributes have a value, but minimized syntax allows,
> but does not require, a value to not be provided if it is the same
> as the attribute's name.

This is backwards. It's legal to omit the attribute *name*
for attributes with a name token group as their declared value.

<!ATTLIST foo
att1 (val1|val2) #IMPLIED
att2 (att2) #IMPLIED
>
<!-- ... -->
<foo val1 att2>
is shorthand for
<foo att1=val1 att2=att2>
^^^^^ ^^^^^ (omitted parts)

As an aside, parsers should also recognize
<IMG LEFT> as shorthand for <IMG ALIGN=LEFT>,
by the same rules. (It's illegal to specify
the same name token as a declared value for
two distinct attributes, so there's no possibility
of ambiguity in this case.)

It is true (in a sense) that all attributes have a value,
but for those specified as #IMPLIED the default value is
determined solely by the application, not by the rules of SGML.

In HTML, the convention for attributes defined like:

<!ATTLIST DL
COMPACT (COMPACT) #IMPLIED
>

seems to be that they are conceptually booleans,
with the default (#IMPLIED) value being "false" and
anything else meaning "true". The only legal value
for "anything else" happens to be the attribute's name
because of the way they're declared.

> I am also having difficulty understanding the interpretation of
> the FORM element METHOD attribute definition in the DTD. It is
> defined as:
>
> METHOD (%HTTP-Method) GET
>
> where the HTTP-Method entity is "GET | POST". I am assuming that
> my limited SGML knowledge is the problem here. However, all the other
> attribute definitions appear to have #IMPLIED or #REQUIRED, except for
> the ENCTYPE following METHOD.
>
> Does this mean that GET or PUT are valid METHOD values, but only GET
> is supported?

Nope, it means that GET is the default value
(if none is specified on the FORM start-tag), and
that GET and POST are the only legal values.

Again, attribute minimization should allow

<form post>
and
<form get>

as shorthand for

<form method=post>
and
<form method=get>.

Hope this helps,

--Joe English

joe@trystero.art.com