Re: Question on attribute values

Daniel W. Connolly (connolly@hal.com)
Tue, 25 Oct 94 21:48:40 EDT

In message <9410260115.AA19670@ka.SJF.Novell.COM>, Keith Ball writes:
>I need to verify my understanding of the attribute value requirements
>covered in section 3.4.3 of the Oct 13 version of the spec.
>
>Here are some statements that I believe are true. If they are not,
>please tell so.

I had a zillion questions like this when I first started reading
HTML and SGML standards too. The way I learned was to download sgmls,
build it, run it, and look at the output and the error messages.

You can do all this through a forms-based WWW interface at:

http://www.hal.com/%7Econnolly/html-test/service/validation-form.html

>1) attribute values that are defined as CDATA, such as ALT for
>the IMG element, MUST be quote delimited even if they do NOT contain
>spaces (0x20) or ">".

The quoting rules are the same for all attributes: if it's a token
(i.e. [A-Za-z0-9\.-]+ ) then you don't need quotes. If it's got
and chars besides A-Za-z0-9\.- then you need quotes.

Space and ">" are what the old CERN code and hence Mosaic use, but
that's not "correct."

The validation service says:

Input

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HEAD>
<TITLE><!-- your title here --></TITLE>
</HEAD>

<BODY>
<!-- your HTML test data -->
<a href=xxx>test 1</a>
<a href=http://xxx.yyy.z/>test 2</a>
<a href=2>test 3</a>
</BODY>

Errors

sgmls: SGML error at -, line 9 at ":":
Incorrect character in markup; markup terminated
sgmls: SGML error at -, line 9 at "p":
Length of name, number, or token exceeded NAMELEN or LITLEN limit
sgmls: SGML error at -, line 9 at ":":
Incorrect character in markup; markup terminated

>2) attribute values which are enumerated values (a fixed set of
>keywords that are accepted and understood) may have, but do not
>require, quote delimiters. Such as ALIGN="top" or ALIGN=top.

As I say, the attribute value being a name is a sufficient, but
not necessary condition for omitting the quotes. One real distinction
between NAME (and hence enumerated values) and CDATA is that NAME
attributes are not case sensitive, while CDATA attributes are
(despite the way Mosaic is implemented. Perhaps the DTD should be
changed so these are CDATA attributes to match Mosaic, but that
seems like a bad idea.)

>3) Within the CDATA defined attribute value strings ANY character
>in the character set is valid, except the starting quote delimiter.
>Therefore if the starting quote delimiter is: ', then the string
>may contain a ", but it may not contain a '.

You can write <a href="abc&#34;def"> or <a href="abc&#quot;def"> to
represent a " inside ""s.

Also, to avoid Mosaic bugs, you're better off writing
<img src="eq1.gif alt="a &gt; b">
than
<img src="eq1.gif alt="a > b">
even though both are legal, because of Mosaic bugs.

>4) Character entities (such as &Otilde;) and numerical character
>references (such as &38) may occur in an attribute value that is
>quote delimited.

Yes, despite Mosaic's implementation. (and the CERN code, and...)

>5) Attributes that have the attribute name in parenthesis but do not
>have a VALUE definition, have an implied value definition of the name
>of the attribute if the name is NOT specified with the attribute.
>Therefore, ALL attributes have a value, but minimized syntax allows,
>but does not require, a value to not be provided if it is the same
>as the attribute's name.

I don't think I understand what you're saying here. Could you give
some examples?

>Of course a client parser needs to be as flexible as possible and
>do its best to handle improper HTML or SGML syntax. But I want to
>make sure my understanding is correct.

Interactive trial and error with some test cases always proved
enlightening for me.

>I am also having difficulty understanding the interpretation of
>the FORM element METHOD attribute definition in the DTD. It is
>defined as:
>
> METHOD (%HTTP-Method) GET
>
>where the HTTP-Method entity is "GET | POST". I am assuming that
>my limited SGML knowledge is the problem here. However, all the other
>attribute definitions appear to have #IMPLIED or #REQUIRED, except for
>the ENCTYPE following METHOD.

It goes
attribute-name attribute-delcared-value default-value
(kinda like type)

So GET is the default value for METHOD. #IMPLIED means there is
no default, but that's OK. #REQUIRED means there is no default,
and you MUST supply a value.

Dan