3in1 HTML dtd

Terry Allen <terry@ora.com>
Date: Fri, 9 Sep 94 14:56:48 EDT
Message-id: <199409091855.LAA03656@rock>
Reply-To: terry@ora.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: Terry Allen <terry@ora.com>
To: Multiple recipients of list <html-wg@oclc.org>
Subject: 3in1 HTML dtd
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)
Here's Dan's 3 DTDs rolled into one file, with marked sections for
the parameter entities that pertain to the 3 levels of conformance.
I append a list of unresolved issues (mostly Proposed stuff that
I deleted).  Yuri has pointed out that the same mechanism can be
spread over 3 files if people prefer the 3-DTD approach.

I would be happy with either approach, but I commend to your 
attention the fact that either way, this 3in1.dtd parses without
warnings.  (That would be a big help if you were making local
mods to test out new stuff.)  

<!--	3in1.dtd, 12.00 Fri 9 Sept version

Prescriptive and Proposed items mostly eliminated, Obsolete saved.
%A.content expanded (used only once)
%literal; expanded.
	-->

<!-- One of the next three entities must be INCLUDE, the other
two IGNORE -->

<!ENTITY % useLevel0 "ignore">
<!ENTITY % useLevel1 "ignore" >
<!ENTITY % useLevel2 "include" >

<!-- Don't edit includes/ignores below here -->
<![ %useLevel0; [
<!ENTITY % Level0only "INCLUDE" >
<!ENTITY % Level1only "IGNORE" >
<!ENTITY % Level1and2 "IGNORE" >
<!ENTITY % Level2only "IGNORE" >
<!ENTITY % HTML.Version
        "+//foo//DTD HTML Level 0//EN//2.0">
]]>

<![ %useLevel1; [
<!ENTITY % Level0only "IGNORE" >
<!ENTITY % Level1only "INCLUDE" >
<!ENTITY % Level1and2 "INCLUDE" >
<!ENTITY % Level2only "IGNORE" >
<!ENTITY % HTML.Version
        "+//foo//DTD HTML Level 1//EN//2.0">
]]>

<![ %useLevel2; [
<!ENTITY % Level0only "IGNORE" >
<!ENTITY % Level1only "IGNORE" >
<!ENTITY % Level1and2 "INCLUDE" >
<!ENTITY % Level2only "INCLUDE" >
<!ENTITY % HTML.Version
        "+//foo//DTD HTML Level 2//EN//2.0">
]]>


<!-- Feature Test Entities -->

<!-- Link Markup -->

<![ %Level0only; [
<!ENTITY % linkattributes
        "NAME CDATA #IMPLIED
        ">
]]>

<![ %Level1and2; [
<!ENTITY % linkType "NAME"
	-- a list of these will be specified at a later date -->
<!ENTITY % linkattributes
        "NAME CDATA #IMPLIED
        REL %linkType #IMPLIED -- forward relationship type --
        REV %linkType #IMPLIED -- reversed relationship type
                              to referent data: --
        URN CDATA #IMPLIED -- universal resource number --
        TITLE CDATA #IMPLIED -- advisory only --
        METHODS NAMES #IMPLIED -- supported public methods of the object:
                                        TEXTSEARCH, GET, HEAD, ... --
        ">
]]>

<!-- Text Markup -->

<![ %Level0only; [ 
<!ENTITY % text "#PCDATA | IMG | BR ">
]]>
<![ %Level1and2; [ 
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
<!ENTITY % font " TT | B | I ">
<!ENTITY % text "#PCDATA | IMG | BR | %phrase; | %font;">
]]>

<!ELEMENT BR    - O EMPTY>

<!ELEMENT IMG    - O EMPTY --  Embedded image -->

<!-- TA removed the following; this should be a Level 2+ not Level 0
requirement
 ENTITY % img.alt.default "#REQUIRED" must have ALT in level 0 images -->

<!ENTITY % URI "CDATA" 
        -- The term URI means a CDATA attribute
           whose value is a Uniform Resource Identifier,
           as defined by
        "Universal Resource Identifiers" by Tim Berners-Lee
        aka http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html

        Note that CDATA attributes are limited by the LITLEN
        capacity (1024 in the current version of html.decl),
        so that URIs in HTML have a bounded length.
        -->

<!ATTLIST IMG
        SRC %URI; #REQUIRED     -- URI of document to embed --
	ALT %URI; #IMPLIED
	ALIGN (top|middle|bottom) #IMPLIED
	ISMAP (ISMAP) #IMPLIED
        >

<!--   Mnemonic character entities. -->

<!ENTITY % ISOlat1 
 public "ISO 8879:1986//ENTITIES Added Latin 1//EN"
--  system "iso-lat1.gml" --
>
%ISOlat1;

<!ENTITY #DEFAULT SDATA "&#38;unkown;" --display the markup-->
<!ENTITY amp CDATA "&#38;"     -- ampersand          -->
<!ENTITY gt CDATA "&#62;"      -- greater than       -->
<!ENTITY lt CDATA "&#60;"      -- less than          -->
<!ENTITY quot CDATA "&#34;"    -- double quote       -->

<!-- DTD "macros" -->

<!ENTITY % heading "H1|H2|H3|H4|H5|H6">

<!ENTITY % list " UL | OL | DIR | MENU " >

<![ %Level0only; [
<!ENTITY % block "P | %list | DL | PRE | XMP | LISTING
		| BLOCKQUOTE ">
]]>

<![ %Level1only; [
<!ENTITY % block "P | %list | DL | XMP | LISTING
		| PRE | BLOCKQUOTE | ISINDEX">
]]>

<![ %Level2only; [
<!ENTITY % block "P | %list | DL | XMP | LISTING
		| PRE | BLOCKQUOTE | FORM | ISINDEX">
]]>

<!-- HyperText -->

<!ENTITY % htext "A | %text"    -- Plus links, no structure -->

<!ELEMENT A     - - (%heading|%block|%text)+ -(A)>
<!ATTLIST A
        HREF %URI; #IMPLIED
        %linkattributes;
        >

<!-- Paragraphs -->

<!ELEMENT P     - O (%htext)+>


<!-- Headings, Titles, Sections -->

<!ELEMENT HR    - O EMPTY -- horizontal rule -->

<!ELEMENT ( %heading )  - -  (%htext;)+>

<!ELEMENT TITLE - -  (#PCDATA)
          -- The TITLE element is not considered part of the flow of text.
             It should be displayed, for example as the page header or
             window title.
          -->


<!-- Text Flows -->

<!ENTITY % flow "(%htext|%block)*">

<!-- Lists -->


<!ELEMENT DL    - -  (DT*, DD?)+>
<!ATTLIST DL
	COMPACT (COMPACT) #IMPLIED>

<!ELEMENT DT    - O (%htext)+>
<!ELEMENT DD    - O %flow>

<!ELEMENT (%list) - -  (LI)+>
<!ATTLIST (%list)
	COMPACT (COMPACT) #IMPLIED>

<!ELEMENT LI    - O %flow>

<!-- Preformatted Text -->

<![ %Level0only; [
<!ENTITY % pre.content "#PCDATA | A ">
]]>

<![ %Level1and2; [
<!ENTITY % pre.content "#PCDATA | A | %font; | %phrase;">
]]>

<!ELEMENT PRE - - (%pre.content)+>

<!ELEMENT XMP - -  CDATA>
<!ELEMENT LISTING - -  CDATA>
<!ELEMENT PLAINTEXT - O   CDATA>


<!-- Document Body -->

<!ENTITY % body.content "(%heading | %htext | %block | HR | ADDRESS)*">

<!ELEMENT BODY O O  %body.content>

<!-- Misc. Body Elements -->

<!ELEMENT BLOCKQUOTE - - %body.content>

<!ENTITY % address.content "(%htext|P)*">
<!ELEMENT ADDRESS - - %address.content>


<!-- Document Head -->

<![ %Level0only; [
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & NEXTID?">
]]>

<![ %Level1and2; [
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? & NEXTID? & LINK*">
]]>

<!ELEMENT HEAD O O  (%head.content)>

<!ELEMENT ISINDEX - O EMPTY
          -- WWW clients should offer the option to perform a search on
             documents containing ISINDEX.
          -->

<!ELEMENT BASE - O EMPTY    -- Reference context for URIs -->
<!ATTLIST BASE
        HREF %URI; #REQUIRED
        >

<!ELEMENT NEXTID - O EMPTY>
<!ATTLIST NEXTID N CDATA #REQUIRED
          -- The number should be a name suitable for use
             for the ID of a new element. When used, the value
             has its numeric part incremented. EG Z67 becomes Z68
          -->


<!-- Document Structure -->

<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
<!-- nonobsolete version is: "HEAD, BODY" only -->

<!ELEMENT HTML O O  (%html.content)>

<![ %Level1and2; [

<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
	HREF %URI; #REQUIRED
        %linkattributes; >

<!ELEMENT (%phrase; | %font;) - - (%htext;)+>

<!ATTLIST PRE
        WIDTH NUMBER #implied
        >
]]>

<![ %Level2only; [

<!ENTITY % Content-Type "CDATA"
	-- meaning a MIME content type, as per RFC1521
	-->

<!ENTITY % HTTP-Method "GET | POST">


<!-- Forms  -->

<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
<!ATTLIST FORM
	ACTION CDATA #REQUIRED
	METHOD (%HTTP-Method) GET
	ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
	>

<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
			RADIO | SUBMIT | RESET |
			IMAGE | HIDDEN )">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
	TYPE %InputType TEXT
	NAME CDATA #IMPLIED -- required for all but submit and reset --
	VALUE CDATA #IMPLIED
	SRC %URI; #IMPLIED -- for image inputs -- 
	CHECKED (CHECKED) #IMPLIED
	SIZE CDATA #IMPLIED -- like NUMBERS,
				 but delimited with comma, not space --
	MAXLENGTH NUMBER #IMPLIED
	ALIGN (top|middle|bottom) #IMPLIED
	>

<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
	NAME CDATA #REQUIRED
	SIZE NUMBER #IMPLIED
	MULTIPLE (MULTIPLE) #IMPLIED
	>

<!ELEMENT OPTION - O (#PCDATA)>
<!ATTLIST OPTION
	SELECTED (SELECTED) #IMPLIED
	VALUE CDATA #IMPLIED
	>

<!ELEMENT TEXTAREA - - (#PCDATA)>
<!ATTLIST TEXTAREA
	NAME CDATA #REQUIRED
	ROWS NUMBER #REQUIRED
	COLS NUMBER #REQUIRED
	>

]]>


===================================================================

(as originally addressed to Dan:)

I went through the three DTDs and looked at all the marked sections.
Aside from Prescriptive A.content, which I've already opposed and
have new and better arguments against, here's a review of the list.
If any of these can be aligned with one of the Levels, that would
be sweet; if not, can they be left for a further revision "2.1"?

body.content.  We could take your Prescriptive content model and use
	if for Level 2.  The difference is that htext would be excluded
	at Level 2.  This is not isomorphic with the other contents of
	Level 2, so maybe it should be for HTML 2.1.  (I agree most
	strongly with the intention, though.)

XMP, LISTING.  Same reasoning.  You has these down as Obsolete for
	good reason, but they exist in current docs.  To discuss and
	maybe decide they can be discarded at 2.1 stage?


What shall we do with 

<![ %HTML.Proposed [
<!ENTITY nbsp CDATA "&#160;"   -- non-breaking space -->
<!ENTITY shy  CDATA "&#173;"   -- soft hyphen        -->
]]>  ?

are these now supported by anyone?

address.content.  Your Prescriptive would allow only htext,
not htext and/or P.  You must have encountered an example
where someone used P inside ADDRESS.  Can you recall why
that was done?  (solve at 2.1 level?)

content model of HTML.  Your nonobsolete version is just
HEAD, BODY; the obsolete one is HEAD, BODY, PLAINTEXT?.
Does this map to any of the Levels?  The difference would
appear to be that the tighter model forbids content after
BODY; the looser one allows PLAINTEXT, but its omission 
spec is - O, so the start tag is required.  I must have
missed discussion on this.  Is there a <HEAD><BODY><PLAINTEXT>
style, or (as HEAD and BODY are O O) is this meant to 
accomodate an instance that is just <PLAINTEXT>...</>?

For Proposed head.content you have:

<![ %HTML.Proposed [
<!ENTITY % head.content-1 "& LINK* & META*">
<!ELEMENT META - O EMPTY    -- Generic Metainformation -->
<!ATTLIST META
        HTTP-EQUIV  NAME    #IMPLIED  -- HTTP response header name  --
        NAME        NAME    #IMPLIED  -- metainformation name       --
        CONTENT     CDATA   #REQUIRED -- associated information     --
        >
]]>

<![ %HTML.Proposed [
        <!ENTITY % font " TT | B | I | U | S ">
        <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE
                                | STRIKE | DFN | KEY">
]]>

same question again.  Are these current practice or can they be
put off till 2.1?



-- 
Terry Allen  (terry@ora.com)   Editor, Digital Media Group
O'Reilly & Associates, Inc.    Sebastopol, Calif., 95472