DTD extensions and modifications

Lou Burnard <lou@vax.ox.ac.uk>
X400-Received: by mta chx400.switch.ch in /PRMD=switch/ADMD=arcom/C=CH/;
               Relayed; Mon, 16 Aug 1993 11:04:34 +0200
X400-Received: by /PRMD=uk.ac/ADMD= /C=gb/; Relayed;
               Mon, 16 Aug 1993 11:04:25 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 16 Aug 1993 11:04:20 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 16 Aug 1993 11:05:26 +0200
X400-Received: by /PRMD=UK.AC/ADMD= /C=GB/; Relayed;
               Mon, 16 Aug 1993 11:04:47 +0200
Date: Mon, 16 Aug 1993 11:04:47 +0200
X400-Originator: lou <lou@vax.oxford.ac.uk>
X400-Recipients: WWW-TALK@INFO.cern.ch
X400-Mts-Identifier: [/PRMD=UK.AC/ADMD= /C=GB/;<009711A5.3D1DE732.18282@vax.ox.]
X400-Content-Type: P2-1984 (2)
Content-Identifier: DTD extension...
From: Lou Burnard <lou@vax.ox.ac.uk>
Message-id: <009711A5.3D1DE732.18282@vax.ox.ac.uk>
To: WWW-TALK@nxoc01.cern.ch
Cc: lou@vax.ox.ac.uk
Subject: DTD extensions and modifications
Status: RO
Should there be one DTD only for HTML? What if my browser wants/knows
about/must have some elements that yours doesn't? What if I want to tag
my documents in some language other than English?  

Just for you to ponder, here are some of the ways in which the TEI has
handled these and related problems which have apparently been agitating
this list lately.

1. Renaming elements

All elements in the TEI dtd are declared indirectly. There is no element
declaration for an element called 'x'. Instead, there is an entity
declaration in the form <!ENTITY % n.x 'x' >  and 'x' is declared thusly
<!ELEMENT %n.x ....>. This means that if you, or your browser, want to
call this element 'z' instead, all you have to do is bung a
re-declaration for the n.x entity into the doctype subset of your
document <!ENTITY % n.x 'z'>

2. Adding new elements

All(most) elements in the TEI dtd are assigned to a class, depending
on whereabouts in a document they can appear. For example, there is a
class of 'phrase' level elements, which can only appear within
paragraphs or other chunks of text, never independently of them.
Corresponding with each class there is a parameter entity %x.class
initially null, which can be redefined by the user to add new members
into the class.

3. Removing existing elements

Every element declaration in the TEI dtd is bracketed by a marked
section named for the element itself, the value of which is by default
'include' but which can be changed to 'ignore' in the DTD subset. So, to
remove the current definition for 'blort' (either because you don't want
to allow blorts or because you want to substitute your own definition
for them -- not a good idea, but it happens --)  you just bung a 
<!ENTITY blort 'IGNORE'> into your dtd subset and kiss those blorts
goodbye (sorry Sebastian)

4. Combining/removing tagsets

Every elements in the TEI dtd is defined in one  'tagset' or dtd
fragment. There are two core tagsets which contain definitions for
elements likely to be needed in almost every kind of document and for
the TEI header; plus about a dozen others, for the most part
characteristic of particular types of text (e.g. verse) or applications
(e.g. hypertext). There are some rules about how they can be mixed and
matched which I won't detail here: the basic principle is simple enough
though -- it works in the same way as (3) above. If you want to
combine elements from tagsets foo and tagset blah, you will put two
appropriate entity declarations into your dtd subset, and magically all
and only those elements are enabled.

For more detail, probably much more than anyone needs, read chapter 3
(ST) of the TEI Guidelines. 

I hate seeing wheels re-invented, don't you? Especially if the
re-invention is a bit wonky...

Lou