HTML 3 DTD suggestion (was: Re: Input areas in <select> & <option>)

Bert Bos (bert@let.rug.nl)
Tue, 13 Dec 94 15:11:45 EST

Dave Raggett wrote:

|Guess, what, the same thing applies to the HTML 3.0 dtd. I have made the
|following changes:
|
|- <!ELEMENT SELECT - - (OPTION+)>
|+ <!ELEMENT SELECT - - (OPTION+) -(INPUT|TEXTAREA|SELECT)>
|
|- <!ELEMENT TEXTAREA - - (#PCDATA)>
|+ <!ELEMENT TEXTAREA - - (#PCDATA) -(INPUT|TEXTAREA|SELECT)>

A few weeks ago I noticed that the latest HTML 3 drafts allowed
multiple BODYs. At first I wondered why, but than the following
occurred to me: when we view a FORM as a special kind of BODY, then a
document may consist of an alternation of BODYs and FORMs. No need for
inclusion exceptions anymore!

<!ELEMENT HTML O O (HEAD, (BODY | FORM)+)>

I've been trying out many more things in the DTD. Below is the state
of my thought as of 21st Nov. It lacks MATH, but is otherwise fairly
complete.

--------------------------------------------------------------------
<!doctype HTML [
<!--

Another candidate for the succession of HTML2. This DTD is based
on Dave Raggett's HTML+, with some ideas from the W3O's
HTML3sketch DTD as well. It is not a strict superset of HTML2,
but it is close enough that conversion from HTML2 to my-HTML
should be easy.

Design goals:

- Easy migration from 2.0 to 3.0

- Place accent on SGML conformance

- Keep HTML simple

- Keep it practical to edit HTML `by hand'

- Make errors easy to find (elements occur either in or between
paragraphs but not both; use `one or more' instead of `zero or
more' where possible)

- No exception inclusions (exception exclusions are much less
problematic)

Some notes:

- Instead of only <A>, we could give all elements an HREF and remove A

- I couldn't find a good way to include the HR element; it is omitted.

- An unnumbered, nestable DIV element might be nicer, but would
require that H1..6 all be replaced by H.

- Maybe we need an attr. on TABLE to give default alignments

- Should we require/allow FIG's to be rescaled?

Author: Bert Bos
E-mail: bert@let.rug.nl
Date: 21 Nov '94

-->

<!-- Character entities ======================================================
ISOlat1: Latin-1 character set (Western European languages)
ISOnum: numeric and special graphic symbols
ISOdia: diacritical marks
ISOpub: publishing symbols
Maybe we should add the rest as well?

WWWicn: useful symbols from the Web
===========================================================================-->

<!ENTITY % ISOlat1 public "ISO 8879-1986//ENTITIES Added Latin 1//EN">
<!ENTITY % ISOnum
public "ISO 8879-1986//ENTITIES Numeric and Special Graphic//EN">
<!ENTITY % ISOdia public "ISO 8879-1986//ENTITIES Diacritical Marks//EN">
<!ENTITY % ISOpub public "ISO 8879-1986//ENTITIES Publishing//EN">
<!ENTITY % WWWicn public "-//W3O//ENTITIES WWW standard icons//EN">

%ISOlat1;
%ISOnum;
%ISOdia;
%ISOpub;
%WWWicn;

<!-- Macros ==================================================================

blocks: all paragraph-like elements; these usually cause a line break

common.attr: attributes found on most elements: id = target of a
hyperlink or style specification; lang = language for
hyphenation, etc; charset = character set for choosing fonts.

needs: hints to the formatter about the space needed for an
element: clean = no floating figures may be placed allowed at
left, right or both sides; ems = suggested width in em; pixels
= suggested width in pixels.

emph: all character-like elements, they usually don't cause a
paragraph break.

inline: the same as emph, plus all other elements that occur in
paragraphs and that don't cause the paragraph to end, including
footnotes and marginal notes.

URI: the type of a URL (just CDATA, since SGML doesn't provide
too many data types)

HTTP-Method: GET and POST, the two methods defined for following
hyperlinks.

Content-Type: the type of an encoding method (reduces to CDATA).

InputType: the possible types of an INPUT element.

linkType: type of link, such as prev, next, toc, author

formula: elements that can occur in MATH

float: floating point number (CDATA)

shape: must be either "CIRCLE x y r", "RECT x y w h", or "POLY x
y x y x y...", where x, y, r, n, w, and h are numbers (expressed in
pixels), x=x-coordinate, y=y-coordinate, w=width, h=height,
r=radius. (0,0) is at top left.
======================================================================-->

<!ENTITY % blocks "P | NOTE | ABSTRACT | ADDRESS | UL | OL | DL | PRE
| QUOTE | TABLE | DISPLAY | FIG">
<!ENTITY % common.attr "ID ID #IMPLIED
LANG CDATA 'en-us'
CHARSET CDATA #IMPLIED">
<!ENTITY % needs "CLEAN (CLEFT|CRIGHT|ALL|NO) NO
EMS NUMBER 0
PIXELS NUMBER 0">
<!ENTITY % form.elts "INPUT | SELECT | TEXTAREA">
<!ENTITY % emph "EM | STRONG | B | I | TT | CODE | VAR | DFN | SUB
| SUP | Q | SAMP | KBD | CITE | KEY | PERSON
| ACRONYM | ABBREV">
<!ENTITY % inline "#PCDATA | A | %emph | FOOTNOTE | MARGIN | BR
| INPUT | SELECT | TEXTAREA | CHANGED | TAB
| MATH | IMG">
<!ENTITY % URI "CDATA">
<!ENTITY % HTTP-Method "(GET | POST)">
<!ENTITY % Content-Type "CDATA">
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | RADIO | SUBMIT
| RESET | RANGE | AUDIO | FILE | SCRIBBLE | HIDDEN)">
<!ENTITY % linkType "(PREV | NEXT | PARENT | HOME | AUTHOR | PATH
| SUBDOCUMENT)">
<!ENTITY % formula "#PCDATA | BOX | ABOVE | BELOW | ROOT | ARRAY
| SUB | SUP">
<!ENTITY % shape "CDATA">

<!-- Document ================================================================
A document consists of a head plus a sequence of bodies and/or
forms. Forms are just like bodies, except that they allow the use of
INPUT, SELECT and TEXTAREA elements.
======================================================================-->

<!ELEMENT HTML O O (HEAD, (BODY | FORM)+)>
<!ATTLIST HTML
VERSION CDATA #FIXED "-//IETF//DTD HTML//EN//3.0">

<!-- Document head ===========================================================
HEAD: container for meta-information
TITLE: for use in window title, hotlist, history, etc.
ISINDEX: server supports searches on this document
BASE: base URL for resolving relative URLs
META: other header info, usually passed as MIME-headers
LINK: relations of this document to other documents
======================================================================-->

<!ELEMENT HEAD O O (TITLE? & ISINDEX? & BASE? & META* & LINK*)>
<!ELEMENT TITLE - - RCDATA>
<!ELEMENT ISINDEX - O EMPTY>
<!ELEMENT BASE - O EMPTY>
<!ATTLIST BASE
HREF %URI #REQUIRED>
<!ELEMENT META - O EMPTY>
<!ATTLIST META
HTTP-EQUIV NAME #IMPLIED
NAME NAME #REQUIRED
VALUE CDATA #REQUIRED>
<!ELEMENT LINK - O EMPTY>
<!ATTLIST LINK
HREF %URI #IMPLIED -- link to explicit document --
REL %linkType #REQUIRED -- kind of link --
TITLE CDATA #IMPLIED -- optional title for link --
METHODS NAMES #IMPLIED -- textsearch, etc -->

<!-- Document body ===========================================================
A body is a sequence of block-like elements and it can also contain a
hierarchy of DIV1..6 elements. Unlike a FORM, a BODY may not contain
INPUT, SELECT or TEXTAREA elements.

DIV1..6: text divisions, each starting with a title
H1..6: titles for divisions, 1 is most important
P: normal paragraph
NOTE: note, set off from the text, usually with an icon
CAPTION: captions for various things: notes, figures, tables, etc.
ABSTRACT: summary of text, usually indented
QUOTE: long quotation, usually indented
DISPLAY: displayed material, such as math or examples
PRE: pre-formatted, obey spaces and newlines, set in fixed width font
======================================================================-->

<!ELEMENT BODY O O ((%blocks)*, ((DIV1+ | DIV2+ | DIV3+ | DIV4+ |
DIV5+ | DIV6+), (%blocks)*)?) -(%form.elts)>
<!ATTLIST BODY
%common.attr>
<!ELEMENT DIV6 O O (H6, (%blocks)*)>
<!ELEMENT DIV5 O O (H5, (%blocks)*, DIV6*)>
<!ELEMENT DIV4 O O (H4, (%blocks)*, DIV5*)>
<!ELEMENT DIV3 O O (H3, (%blocks)*, DIV4*)>
<!ELEMENT DIV2 O O (H2, (%blocks)*, DIV3*)>
<!ELEMENT DIV1 O O (H1, (%blocks)*, DIV2*)>
<!ATTLIST (DIV6 | DIV5 | DIV4 | DIV3 | DIV2 | DIV1)
%common.attr>
<!ELEMENT (H1 | H2 | H3 | H4 | H5 | H6) O O (%inline)+>
<!ATTLIST (H1 | H2 | H3 | H4 | H5 | H6)
%common.attr
%needs
ALIGN (FLUSHLEFT | FLUSHRIGHT | CENTER | JUSTIFY) #IMPLIED
NOWRAP (NOWRAP) #IMPLIED
NOFOLD (NOFOLD) #IMPLIED>
<!ELEMENT P - O (%inline)+>
<!ATTLIST P
%common.attr
%needs
ALIGN (FLUSHLEFT | FLUSHRIGHT | CENTER | JUSTIFY) #IMPLIED
NOWRAP (NOWRAP) #IMPLIED
NOFOLD (NOFOLD) #IMPLIED>
<!ELEMENT NOTE - - (CAPTION?, (%blocks)+)>
<!ATTLIST NOTE
%common.attr
%needs
ROLE (SIMPLE | NOTE | WARNING | CAUTION) SIMPLE>
<!ELEMENT CAPTION - - (%inline)+>
<!ATTLIST CAPTION
%common.attr
ALIGN (TOP | BOTTOM) #IMPLIED>
<!ELEMENT ABSTRACT - - (%blocks)+>
<!ATTLIST ABSTRACT
%common.attr>
<!ELEMENT QUOTE - - (%blocks)+>
<!ATTLIST QUOTE
%common.attr>
<!ELEMENT DISPLAY - - (%blocks)+>
<!ATTLIST DISPLAY
%common.attr
ALIGN (LEFT | RIGHT | CENTER | INDENT) #IMPLIED
EQNO (LEFTEQNO |RIGHTEQNO |NOEQNO) NOEQNO>
<!ELEMENT ADDRESS - - (%inline)+>
<!ATTLIST ADDRESS
%common.attr>
<!ELEMENT PRE - - (%inline)+>
<!ATTLIST PRE
%common.attr
%needs>

<!-- Forms ===================================================================
A FORM is a special type of BODY. It is a sequence of block-like
elements and it can also contain a hierarchy of DIV1..6
elements. Unlike a BODY, a FORM may contain INPUT, SELECT or
TEXTAREA elements.
======================================================================-->

<!ELEMENT FORM - O ((%blocks)*, ((DIV1+ | DIV2+ | DIV3+ | DIV4+ |
DIV5+ | DIV6+), (%blocks)*)?)>
<!ATTLIST FORM
%common.attr
ACTION %URI #REQUIRED
METHOD %HTTP-Method GET
ENCTYPE %Content-Type "application/x-www-form-urlencoded">
<!ELEMENT INPUT - O EMPTY>
<!ATTLIST INPUT
%common.attr
TYPE %InputType TEXT
NAME NAME #IMPLIED -- required except for submit/reset --
VALUE CDATA #IMPLIED
SRC %URI #IMPLIED -- for image input --
CHECKED (CHECKED) #IMPLIED
SIZE NUMBERS #IMPLIED -- width [height [depth ...]] --
MAXLENGTH NUMBER #IMPLIED -- for text/password type --
ALIGN (TOP | MIDDLE | BOTTOM) TOP>
<!ELEMENT SELECT - - (OPTION+)>
<!ATTLIST SELECT
%common.attr
NAME NAME #REQUIRED
SIZE NUMBER #IMPLIED
SRC %URI #IMPLIED -- for graphical selection menus --
MULTIPLE (MULTIPLE) #IMPLIED>
<!ELEMENT OPTION - O RCDATA>
<!ATTLIST OPTION
%common.attr
SELECTED (SELECTED) #IMPLIED
VALUE CDATA #IMPLIED
SHAPE %shape #IMPLIED -- for graphical selection -->
<!ELEMENT TEXTAREA - - RCDATA>
<!ATTLIST TEXTAREA
%common.attr
NAME NAME #REQUIRED
ROWS NUMBER #IMPLIED
COLS NUMBER #IMPLIED>

<!-- Lists ===================================================================
UL: unnumbered list, items normally have bullets
OL: numbered list, item numbers are auto-incremented
DL: definition list, consisting of terms and definitions

Note that DT may no longer have empty content, which used to be quite
common in HTML2, in order to make indented paragraphs. Use QUOTE or
DISPLAY instead, or rely on the style sheet.

Also note that a DL is now a sequence of DIs (definition items),
each containing terms and definitions for that item.
======================================================================-->

<!ELEMENT UL - - (LI+)>
<!ATTLIST UL
%common.attr
%needs
PLAIN (PLAIN) #IMPLIED
WRAP (VERT | HORIZ | NONE) NONE
COMPACT (COMPACT) #IMPLIED
SRC %URI #IMPLIED -- default bullet --
LABEL CDATA #IMPLIED -- default bullet if no src -->
<!ELEMENT OL - - (LI+)>
<!ATTLIST OL
%common.attr
%needs
START NUMBER 1
COMPACT (COMPACT) #IMPLIED>
<!ELEMENT LI - O (%blocks)+>
<!ATTLIST LI
%common.attr
%needs
SRC %URI #IMPLIED -- special bullet/label --
LABEL CDATA #IMPLIED -- special bullet if no src -->
<!ELEMENT DL - - (DI+)>
<!ATTLIST DL
%common.attr
%needs
COMPACT (COMPACT) #IMPLIED>
<!ELEMENT DI O O (DT+, DD+)>
<!ATTLIST DI
%common.attr>
<!ELEMENT DT O O (%inline)+>
<!ATTLIST DT
%common.attr>
<!ELEMENT DD - O (%blocks)+>
<!ATTLIST DD
%common.attr>

<!-- Tables ==================================================================
A TABLE is a sequence of TRs (table rows), each of which is a
sequence of THs and TDs (table header, table data).
======================================================================-->

<!ELEMENT TABLE - - (CAPTION?, TR+)>
<!ATTLIST TABLE
%common.attr
%needs>

<!ELEMENT TR - O (TH | TD)+>
<!ATTLIST TR
%common.attr>
<!ELEMENT (TH | TD) - O ((%inline)+ | (%blocks)+)>
<!ATTLIST (TH | TD)
%common.attr
COLSPAN NUMBER 1 -- columns spanned --
ROWSPAN NUMBER 1 -- rows spanned --
ALIGN (LEFT | RIGHT | CENTER | JUSTIFY) #IMPLIED
VALIGN (TOP | BOTTOM | MIDDLE) TOP
NOWRAP (NOWRAP) #IMPLIED
WIDTH NUMBER #IMPLIED -- suggested width (in em) -->

<!-- Figures =================================================================
FIG elements can occur between paragraphs, not inside them. The
image may be a floating image (floating to the next free
slot at the left or right margin) or it may be rendered in place.

The WIDTH and HEIGHT attributes give suggested sizes; the browser
should try to scale the image (or other document) to this
size. If both are given and KEEPASPECT is not present, then the
image (if it is an image) is strected so that it fits both
dimensions.

The contents of the element are only displayed when the browser is
skipping images. The contents must be parsed even if the image *is*
displayed, since it can contain `hotspots'. An A inside a FIG is a
`hotspot' if it has a SHAPE attribute.

IMG is for small, in-line graphics. An IMG is treated as a
letter, that's why it has a DEPTH attribute.
======================================================================-->

<!ELEMENT FIG - - (CAPTION?, OVERLAY*, ((%inline)+|(%blocks)*))>
<!ATTLIST FIG
%common.attr
%needs
SRC %URI #REQUIRED -- document to embed --
ALIGN (LEFT | CENTER | RIGHT | JUSTIFY) #IMPLIED
WIDTH NUMBER #IMPLIED -- suggested width in units --
HEIGHT NUMBER #IMPLIED -- suggested height in units --
KEEPASPECT (KEEPASPECT) #IMPLIED -- don't deform image --
UNITS (EM | PIXELS) PIXELS -- for WIDTH & HEIGHT --
ISMAP (ISMAP) #IMPLIED -- pass clicks to server -->

<!ELEMENT OVERLAY - O EMPTY>
<!ATTLIST OVERLAY
SRC %URI #REQUIRED -- image to overlay --
X NUMBER 0 -- offset from left in units --
Y NUMBER 0 -- offset from top in units --
WIDTH NUMBER #IMPLIED -- suggested width in units --
HEIGHT NUMBER #IMPLIED -- suggested height in units --
KEEPASPECT (KEEPASPECT) #IMPLIED -- don't deform image --
UNITS (EM | PIXELS) PIXELS -- for WIDTH & HEIGHT -->

<!ELEMENT IMG - O EMPTY>
<!ATTLIST IMG
%common.attr
SRC %URI #REQUIRED -- image to embed --
DEPTH NUMBER 0 -- extend below the baseline --
ALT CDATA #IMPLIED -- text instead of img -->

<!-- Character level =========================================================
EM: emphasized text, typically italics
STRONG: stronger emphasis, typically bold
DFN: defining instance of a term, typically bold
ACRONYM: acronym, typically in small caps
KBD: keyboard input, typically monospaced
PERSON: name of a person
CODE: program source code, typically monospaced
VAR: name of a variable, typically italics
ABBREV: abbreviation
SAMP: example, typically monospaced
B: bold (use strong instead if possible)
I: italics (use em instead if possible)
TT: typewriter type (use code/kbd/samp if possible)
CITE: reference to literature, typically in brackets []
Q: short quote, typically in quotes `' or ""
MARGIN: annotation, put in the margin if possible
FOOTNOTE: annotation, put in a footnote or pop-up if possible
CHANGED: begin or end of a `change bar' (note: empty element!)
BR: force a line break
TAB: set or jump to tab stop, attr. BEFORE only valid in comb. with ID
======================================================================-->

<!ELEMENT (%emph) - - (%inline)+>
<!ATTLIST (%emph)
%common.attr>

<!ELEMENT FOOTNOTE - - (%inline)+>
<!ATTLIST FOOTNOTE
%common.attr>

<!ELEMENT MARGIN - - (%inline)+>
<!ATTLIST MARGIN
%common.attr>

<!ELEMENT BR - O EMPTY>

<!ELEMENT CHANGED - O EMPTY>
<!ATTLIST CHANGED
BEGIN ID #IMPLIED -- signals beginning of changes --
END IDREF #IMPLIED -- signals end of changes -->

<!ELEMENT TAB - O EMPTY>
<!ATTLIST TAB
ID ID #IMPLIED -- defines named tab stop --
BEFORE NUMBER 0 -- extra space before (em) --
TO IDREF #IMPLIED -- jump to named tab stop -->

<!-- Links ===================================================================
Anchors have the following attributes:
HREF: the URL of the linked document
SHAPE: (only inside a FIG) shape of hotspot
REL: type of relation: next, path, subdocument, etc.
TITLE: title for linked document, advisory only
METHODS: access methods supported by linked document
======================================================================-->

<!ELEMENT A - - (%inline)+ -(A)>
<!ATTLIST A
%common.attr
HREF %URI #REQUIRED
SHAPE %shape #IMPLIED -- for hotspots in FIG --
REL %linkType #IMPLIED
TITLE CDATA #IMPLIED -- suggested title --
METHODS NAMES #IMPLIED -- TEXTSEARCH, GET,... -->

<!-- Math ====================================================================
TBD

======================================================================-->

]>

<TITLE>Test</TITLE>

<BODY>
<DIV1>
<H1>First chapter</H1>
<P>This is a normal body, no forms allowed.
<DL>
<DI>term 1
<DD><P>One term and one definition.
<DI>term 2a<DT>term 2b<DT>term 2c
<DD><P>Three terms with one definition.
<DI>term 3
<DD><P>One term with two definitions.
<DD><P>The second has two paragraphs.
<P>This is the 2nd par of the 2nd def of term 3.
</DL>
</DIV1>
</BODY>

<FORM action="/htbin/test">
<DIV1>
<H1>Second chapter</H1>
<P>This a <EM>form body,</EM> input tags are allowed.
<P><INPUT TYPE="RESET"> Press here to reset.
</DIV1>
</FORM>

<!--
Local Variables:
sgml-empty-tags: ("DI" "DT" "DD" "HR" "P" "LI" "BR" "SP" "TR" "TB" "INPUT" "FIGA" "IMG" "CHANGED" "LINK" "BASE" "ISINDEX" "NEXTID")
sgml-validate-command: "sgmls -s HTML+.sgml"
end:
-->

-- 
                          Bert Bos                      Alfa-informatica
                 <bert@let.rug.nl>           Rijksuniversiteit Groningen
    <http://www.let.rug.nl/~bert/>     Postbus 716, NL-9700 AS GRONINGEN