3in1 HTML dtd

Terry Allen (terry@ora.com)
Fri, 9 Sep 94 14:56:48 EDT

Here's Dan's 3 DTDs rolled into one file, with marked sections for
the parameter entities that pertain to the 3 levels of conformance.
I append a list of unresolved issues (mostly Proposed stuff that
I deleted). Yuri has pointed out that the same mechanism can be
spread over 3 files if people prefer the 3-DTD approach.

I would be happy with either approach, but I commend to your
attention the fact that either way, this 3in1.dtd parses without
warnings. (That would be a big help if you were making local
mods to test out new stuff.)

<!-- 3in1.dtd, 12.00 Fri 9 Sept version

Prescriptive and Proposed items mostly eliminated, Obsolete saved.
%A.content expanded (used only once)
%literal; expanded.
-->

<!-- One of the next three entities must be INCLUDE, the other
two IGNORE -->

<!ENTITY % useLevel0 "ignore">
<!ENTITY % useLevel1 "ignore" >
<!ENTITY % useLevel2 "include" >

<!-- Don't edit includes/ignores below here -->
<![ %useLevel0; [
<!ENTITY % Level0only "INCLUDE" >
<!ENTITY % Level1only "IGNORE" >
<!ENTITY % Level1and2 "IGNORE" >
<!ENTITY % Level2only "IGNORE" >
<!ENTITY % HTML.Version
"+//foo//DTD HTML Level 0//EN//2.0">
]]>

<![ %useLevel1; [
<!ENTITY % Level0only "IGNORE" >
<!ENTITY % Level1only "INCLUDE" >
<!ENTITY % Level1and2 "INCLUDE" >
<!ENTITY % Level2only "IGNORE" >
<!ENTITY % HTML.Version
"+//foo//DTD HTML Level 1//EN//2.0">
]]>

<![ %useLevel2; [
<!ENTITY % Level0only "IGNORE" >
<!ENTITY % Level1only "IGNORE" >
<!ENTITY % Level1and2 "INCLUDE" >
<!ENTITY % Level2only "INCLUDE" >
<!ENTITY % HTML.Version
"+//foo//DTD HTML Level 2//EN//2.0">
]]>

<!-- Feature Test Entities -->

<!-- Link Markup -->

<![ %Level0only; [
<!ENTITY % linkattributes
"NAME CDATA #IMPLIED
">
]]>

<![ %Level1and2; [
<!ENTITY % linkType "NAME"
-- a list of these will be specified at a later date -->
<!ENTITY % linkattributes
"NAME CDATA #IMPLIED
REL %linkType #IMPLIED -- forward relationship type --
REV %linkType #IMPLIED -- reversed relationship type
to referent data: --
URN CDATA #IMPLIED -- universal resource number --
TITLE CDATA #IMPLIED -- advisory only --
METHODS NAMES #IMPLIED -- supported public methods of the object:
TEXTSEARCH, GET, HEAD, ... --
">
]]>

<!-- Text Markup -->

<![ %Level0only; [
<!ENTITY % text "#PCDATA | IMG | BR ">
]]>
<![ %Level1and2; [
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
<!ENTITY % font " TT | B | I ">
<!ENTITY % text "#PCDATA | IMG | BR | %phrase; | %font;">
]]>

<!ELEMENT BR - O EMPTY>

<!ELEMENT IMG - O EMPTY -- Embedded image -->

<!-- TA removed the following; this should be a Level 2+ not Level 0
requirement
ENTITY % img.alt.default "#REQUIRED" must have ALT in level 0 images -->

<!ENTITY % URI "CDATA"
-- The term URI means a CDATA attribute
whose value is a Uniform Resource Identifier,
as defined by
"Universal Resource Identifiers" by Tim Berners-Lee
aka http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html

Note that CDATA attributes are limited by the LITLEN
capacity (1024 in the current version of html.decl),
so that URIs in HTML have a bounded length.
-->

<!ATTLIST IMG
SRC %URI; #REQUIRED -- URI of document to embed --
ALT %URI; #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
ISMAP (ISMAP) #IMPLIED
>

<!-- Mnemonic character entities. -->

<!ENTITY % ISOlat1
public "ISO 8879:1986//ENTITIES Added Latin 1//EN"
-- system "iso-lat1.gml" --
>
%ISOlat1;

<!ENTITY #DEFAULT SDATA "&#38;unkown;" --display the markup-->
<!ENTITY amp CDATA "&#38;" -- ampersand -->
<!ENTITY gt CDATA "&#62;" -- greater than -->
<!ENTITY lt CDATA "&#60;" -- less than -->
<!ENTITY quot CDATA "&#34;" -- double quote -->

<!-- DTD "macros" -->

<!ENTITY % heading "H1|H2|H3|H4|H5|H6">

<!ENTITY % list " UL | OL | DIR | MENU " >

<![ %Level0only; [
<!ENTITY % block "P | %list | DL | PRE | XMP | LISTING
| BLOCKQUOTE ">
]]>

<![ %Level1only; [
<!ENTITY % block "P | %list | DL | XMP | LISTING
| PRE | BLOCKQUOTE | ISINDEX">
]]>

<![ %Level2only; [
<!ENTITY % block "P | %list | DL | XMP | LISTING
| PRE | BLOCKQUOTE | FORM | ISINDEX">
]]>

<!-- HyperText -->

<!ENTITY % htext "A | %text" -- Plus links, no structure -->

<!ELEMENT A - - (%heading|%block|%text)+ -(A)>
<!ATTLIST A
HREF %URI; #IMPLIED
%linkattributes;
>

<!-- Paragraphs -->

<!ELEMENT P - O (%htext)+>

<!-- Headings, Titles, Sections -->

<!ELEMENT HR - O EMPTY -- horizontal rule -->

<!ELEMENT ( %heading ) - - (%htext;)+>

<!ELEMENT TITLE - - (#PCDATA)
-- The TITLE element is not considered part of the flow of text.
It should be displayed, for example as the page header or
window title.
-->

<!-- Text Flows -->

<!ENTITY % flow "(%htext|%block)*">

<!-- Lists -->

<!ELEMENT DL - - (DT*, DD?)+>
<!ATTLIST DL
COMPACT (COMPACT) #IMPLIED>

<!ELEMENT DT - O (%htext)+>
<!ELEMENT DD - O %flow>

<!ELEMENT (%list) - - (LI)+>
<!ATTLIST (%list)
COMPACT (COMPACT) #IMPLIED>

<!ELEMENT LI - O %flow>

<!-- Preformatted Text -->

<![ %Level0only; [
<!ENTITY % pre.content "#PCDATA | A ">
]]>

<![ %Level1and2; [
<!ENTITY % pre.content "#PCDATA | A | %font; | %phrase;">
]]>

<!ELEMENT PRE - - (%pre.content)+>

<!ELEMENT XMP - - CDATA>
<!ELEMENT LISTING - - CDATA>
<!ELEMENT PLAINTEXT - O CDATA>

<!-- Document Body -->

<!ENTITY % body.content "(%heading | %htext | %block | HR | ADDRESS)*">

<!ELEMENT BODY O O %body.content>

<!-- Misc. Body Elements -->

<!ELEMENT BLOCKQUOTE - - %body.content>

<!ENTITY % address.content "(%htext|P)*">
<!ELEMENT ADDRESS - - %address.content>

<!-- Document Head -->

<![ %Level0only; [
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & NEXTID?">
]]>

<![ %Level1and2; [
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & NEXTID? & LINK*">
]]>

<!ELEMENT HEAD O O (%head.content)>

<!ELEMENT ISINDEX - O EMPTY
-- WWW clients should offer the option to perform a search on
documents containing ISINDEX.
-->

<!ELEMENT BASE - O EMPTY -- Reference context for URIs -->
<!ATTLIST BASE
HREF %URI; #REQUIRED
>

<!ELEMENT NEXTID - O EMPTY>
<!ATTLIST NEXTID N CDATA #REQUIRED
-- The number should be a name suitable for use
for the ID of a new element. When used, the value
has its numeric part incremented. EG Z67 becomes Z68
-->

<!-- Document Structure -->

<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
<!-- nonobsolete version is: "HEAD, BODY" only -->

<!ELEMENT HTML O O (%html.content)>

<![ %Level1and2; [

<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
HREF %URI; #REQUIRED
%linkattributes; >

<!ELEMENT (%phrase; | %font;) - - (%htext;)+>

<!ATTLIST PRE
WIDTH NUMBER #implied
>
]]>

<![ %Level2only; [

<!ENTITY % Content-Type "CDATA"
-- meaning a MIME content type, as per RFC1521
-->

<!ENTITY % HTTP-Method "GET | POST">

<!-- Forms -->

<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
<!ATTLIST FORM
ACTION CDATA #REQUIRED
METHOD (%HTTP-Method) GET
ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
>

<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
RADIO | SUBMIT | RESET |
IMAGE | HIDDEN )">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
TYPE %InputType TEXT
NAME CDATA #IMPLIED -- required for all but submit and reset --
VALUE CDATA #IMPLIED
SRC %URI; #IMPLIED -- for image inputs --
CHECKED (CHECKED) #IMPLIED
SIZE CDATA #IMPLIED -- like NUMBERS,
but delimited with comma, not space --
MAXLENGTH NUMBER #IMPLIED
ALIGN (top|middle|bottom) #IMPLIED
>

<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
NAME CDATA #REQUIRED
SIZE NUMBER #IMPLIED
MULTIPLE (MULTIPLE) #IMPLIED
>

<!ELEMENT OPTION - O (#PCDATA)>
<!ATTLIST OPTION
SELECTED (SELECTED) #IMPLIED
VALUE CDATA #IMPLIED
>

<!ELEMENT TEXTAREA - - (#PCDATA)>
<!ATTLIST TEXTAREA
NAME CDATA #REQUIRED
ROWS NUMBER #REQUIRED
COLS NUMBER #REQUIRED
>

]]>

===================================================================

(as originally addressed to Dan:)

I went through the three DTDs and looked at all the marked sections.
Aside from Prescriptive A.content, which I've already opposed and
have new and better arguments against, here's a review of the list.
If any of these can be aligned with one of the Levels, that would
be sweet; if not, can they be left for a further revision "2.1"?

body.content. We could take your Prescriptive content model and use
if for Level 2. The difference is that htext would be excluded
at Level 2. This is not isomorphic with the other contents of
Level 2, so maybe it should be for HTML 2.1. (I agree most
strongly with the intention, though.)

XMP, LISTING. Same reasoning. You has these down as Obsolete for
good reason, but they exist in current docs. To discuss and
maybe decide they can be discarded at 2.1 stage?

What shall we do with

<![ %HTML.Proposed [
<!ENTITY nbsp CDATA "&#160;" -- non-breaking space -->
<!ENTITY shy CDATA "&#173;" -- soft hyphen -->
]]> ?

are these now supported by anyone?

address.content. Your Prescriptive would allow only htext,
not htext and/or P. You must have encountered an example
where someone used P inside ADDRESS. Can you recall why
that was done? (solve at 2.1 level?)

content model of HTML. Your nonobsolete version is just
HEAD, BODY; the obsolete one is HEAD, BODY, PLAINTEXT?.
Does this map to any of the Levels? The difference would
appear to be that the tighter model forbids content after
BODY; the looser one allows PLAINTEXT, but its omission
spec is - O, so the start tag is required. I must have
missed discussion on this. Is there a <HEAD><BODY><PLAINTEXT>
style, or (as HEAD and BODY are O O) is this meant to
accomodate an instance that is just <PLAINTEXT>...</>?

For Proposed head.content you have:

<![ %HTML.Proposed [
<!ENTITY % head.content-1 "& LINK* & META*">
<!ELEMENT META - O EMPTY -- Generic Metainformation -->
<!ATTLIST META
HTTP-EQUIV NAME #IMPLIED -- HTTP response header name --
NAME NAME #IMPLIED -- metainformation name --
CONTENT CDATA #REQUIRED -- associated information --
>
]]>

<![ %HTML.Proposed [
<!ENTITY % font " TT | B | I | U | S ">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE
| STRIKE | DFN | KEY">
]]>

same question again. Are these current practice or can they be
put off till 2.1?

-- 
Terry Allen  (terry@ora.com)   Editor, Digital Media Group
O'Reilly & Associates, Inc.    Sebastopol, Calif., 95472