Re: HTML 2.0 SGML declaration [was: ATTSPLEN?]

Paul Grosso (pbg@texcel.no)
Wed, 4 Jan 95 09:50:59 EST

> From: Paul Burchard <burchard@horizon.math.utah.edu>
>
> pbg@texcel.no (Paul Grosso) writes:
> > LITLEN [...] ATTSPLEN [...] TAGLEN [...]
>
> > Note that it doesn't make sense to expect to be able to
> > enter large values for attributes unless one increases
> > all three of the above quantities.
>
> Thanks for the explanation -- it looks like we have a definite
> problem in the current SGML declaration for HTML, then. It sets
> LITLEN to 1024 in order to provide reasonable room for URLs and FORM
> values, but then leaves ATTSPLEN and TAGLEN at their default values.

Looking at the HTML 2.0 SGML decl, I realize I don't remember any
discussion on the quantity values before, so I might have missed
something. But here's my comments on the SGML declaration.

I didn't consider the capacities--I think capacities are more annoyances
than useful, and almost all products I have seen rightly ignore, for
all practical purposes, the capacities (usually after giving a warning).
Besides, appropriate values for capacities are usually only determinable
by trial and error, and I figure Dan's already done that.

I have to admit to lack of expertise in the area of the details of the
character set stuff in SGML declarations. Not that I haven't tried,
but there are just too many issues for me to know if the ones given
are the best ones for HTML use. Character sets in HTML is an open
issue, and I have no reason to think that the ones Dan has included
aren't the best ones for now.

The features are quite reasonable and standard, and the syntax is the same
as the Reference Concrete Syntax (RCS) with the exception of the quantities.

What's currently in the HTML 2.0 spec as far as quantities is:

QUANTITY SGMLREF
NAMELEN 72 -- somewhat arbitrary; taken from
internet line length conventions --
TAGLVL 100
LITLEN 1024
GRPGTCNT 150
GRPCNT 64

For reference, the RCS quantities are:

QUANTITY SGMLREF
ATTCNT 40
ATTSPLEN 960
BSEQLEN 960
DTAGLEN 16
DTEMPLEN 16
ENTLVL 16
GRPCNT 32
GRPGTCNT 96
GRPLVL 16
LITLEN 240
NAMELEN 8
NORMSEP 2
PILEN 240
TAGLEN 960
TAGLVL 24

My comments:

1. I'm not sure why it was felt necessary for GRPCNT, GRPGTCNT, and
TAGLVL to be raised from their RCS values. In my experience, I
have rarely seen the need, and the HTML application is one of the
smaller ones I've seen. I don't see anything wrong with the larger
values, I was just a bit surprised to see them.

2. A value of 1024 for LITLEN makes sense. Most people increase PILEN
when LITLEN is increased. Basically, if you expect to have large
literals, you might well have large PIs. In particular, PIs may be
used to contain things that are related to things tags contain, so
I usually recommend a PILEN at least as large as TAGLEN. (From the
following paragraph, that would imply a value of 4230 if you follow
the argument in a strict fashion.) I would recommend making PILEN
at least the same as LITLEN--in this case, 1024.

3. As the earlier exchange discusses, ATTSPLEN and TAGLEN should usually
be increased when LITLEN is increased. [This isn't necessarily the
case--one might want to allow for large literals in parameter literals
(e.g., for the replacement text of entities), but still not expect
such long literals for attribute value literals. I am assuming that
we wish to allow URL's and VALUE's and such to have lengths up to
LITLEN in the rest of this paragraph.] A quick glance at the DTD shows
that the elements A and INPUT have four CDATA attributes plus a few
others, LINK has three CDATA atts plus others, and IMG and FORM have
two CDATA atts plus others. Unless someone has a good argument for
thinking it isn't necessary to allow for the case that all four of
A's and INPUT's CDATA attributes have values that approach LITLEN,
that would indicate a value of ATTSPLEN near 4150. With a NAMELEN
of 72 (even though no element names currently approach that), that
would suggest a TAGLEN near 4230 in round numbers. In practice, one
would rarely expect such extremes, so smaller numbers may be reasonable,
but I'm just laying out the appropriate logic. In particular, the
elements A (with HREF and NAME), IMG (with SRC and ALT), INPUT (with
SRC and VALUE), and LINK (with HREF and URN) all have at least two
CDATA attributes that, I would think, could both get long (either by
virtue of being a URL or URN or by having a long textual string for
a value), so a value of at least 2100 for ATTSPLEN and TAGLEN seems
necessary if we want to be consistent with LITLEN.

paul

Paul Grosso
VP Research Chief Technical Officer
ArborText, Inc. SGML Open

Email: paul@arbortext.com
or pbg@texcel.no