SGML revision: SHORTTAG separation

Daniel W. Connolly (connolly@beach.w3.org)
Thu, 20 Jul 95 17:42:04 EDT

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <2131.806276398.1@beach.w3.org>

I found this in comp.text.sgml, and I thought I'd forward it.

Note especially the following:

>the HTML WG should submit a request for separation of these overloaded uses
>of the SHORTTAG feature to the SGML review process. please mail me if you
>wish to take part in formulating such a request.

------- Forwarded Message

From: erik@naggum.no (Erik Naggum)
Newsgroups: comp.text.sgml,comp.infosystems.www.authoring.html
Subject: Re: HTML - A Subset of SGML ? NO!
Date: 13 Jul 1995 14:46:16 GMT
Organization: Naggum Software; +47 2295 0313
Message-ID: <19950713T144616Z@naggum.no>
References: <3tqd7a$i6n@harbinger.cc.monash.edu.au> <19950711T182930Z@naggum.no> <3u0trb$4pj@life.ai.mit.edu> <19950712T184900Z@naggum.no> <3u2cg0$rm8@crl3.crl.com>

thanks to Glenn Adams for also answering my request for information.

[Joe English]

| The SGML declaration for HTML would have specified SHORTTAG NO, were it
| not for markup in HTML 1.0 like <DL COMPACT> (which few browsers
| implement) and <IMG ISMAP> (which few people use), and certain
| constructs in FORMs like <INPUT CHECKED>, <SELECT MULTIPLE>, and
| <OPTION SELECTED>.
|
| Many browsers only get attribute minimization right for those few
| special cases that made SHORTTAG YES necessary to begin with, and don't
| understand the *un*minimized syntax. (This situation may be changing
| though.)

I've noted that we need SHORTTAG because of attribute literals and omitted
attribute names. to the extent that default attributes values are used, I
imagine that this, too, would be required for HTML.

there is thus a pronounced user requirement to separate the effect of
SHORTTAG YES to clauses 7.4.1 (start-tag minimization) and 7.5.1 (end-tag
minimization) vs clauses 7.9.1 (attribute specification minimization) and
7.9.3.1 (attribute value specification minimization), as use of the former
causes problems in parsing the document, and lack of the latter causes
problems in authoring environments and to users.

the HTML WG should submit a request for separation of these overloaded uses
of the SHORTTAG feature to the SGML review process. please mail me if you
wish to take part in formulating such a request.

| Null end-tag and empty end-tag minimization are very much <em/desired/
| for HTML, but since the authoring community at large doesn't know about
| these features (and those that do don't use them due to lack of
| support), there's been no great rush to implement them. Perhaps when
| enough users realize what they're missing browser writers will will be
| pressured into doing so; right now the attitude seems to be that
| SHORTTAG is just another obscure SGML feature that nobody really cares
| about.

this is a wish, not a requirement demonstrated through very extensive use
in the community. I reserve the right to argue strongly against allowing
minimized start-tags and end-tags. these are features that were included
only to allow other markup languages to be "ported" to SGML, although there
is no record of this actually having happened.

an end-tag may be minimized in four ways: (1) it may be omitted, (2) it may
be empty, (3) it may be tagc-challenged, or (4) it may be the null end-tag.
case (1) is covered by OMITTAG, the others by SHORTTAG. however, this is
not all there is to it: short entity references afford yet another way to
reduce the typing and resulting (textual) appearance of the SGML document.

examples:
(1) <foo>foo cannot contain bar, so foo ends now.<bar>
(2) <foo>foo is the innermost open element, so it and only it can end.</>
(3) <foo>bar's start-tag follows immediately</foo<bar>
(4) <foo/the slash is now special, and cannot be escaped easily./

the short entity reference feature is an aspect of the syntax chosen and
declared in the SGML declaration that is generally an aspect of the
application, not of the individual document. it is thus under application
control, both which character strings are used and what they map to in each
individual element.

e.g., a bibliographical reference may be written as [Goldfarb] and expand
to <bibref>Goldfarb</bibref> where such are allowed, and map to something
else if not.

I won't belabor the point, but this is a far more general mechanism than
lexically challenged tags. Naggum says: check it out.

------- =_aaaaaaaaaa0
Content-Type: erik/(null)
Content-ID: <2131.806276398.2@beach.w3.org>
Content-Transfer-Encoding: base64

LSAtLSAKTkVUU0NBUElTTSAvbmV0LSdzY2EtLHBpLXoqbS8gbiAoMTk5NSk6IGhhYml0dWFsIGRp
dmVyc2lvbiBvZiB0aGUgbWluZCB0bwogICAgcHVyZWx5IGltYWdpbmF0aXZlIGFjdGl2aXR5IG9y
IGVudGVydGFpbm1lbnQgYXMgYW4gZXNjYXBlIGZyb20gdGhlCiAgICByZWFsaXphdGlvbiB0aGF0
IHRoZSBJbnRlcm5ldCB3YXMgYnVpbHQgYnkgYW5kIGZvciBzb21lb25lIGVsc2UuCgoKLS0tLS0t
LSBFbmQgb2YgRm9yd2FyZGVkIE1lc3NhZ2UKCg==

------- =_aaaaaaaaaa0--