HTML DTD and related problems (rather long)

fkappe@fiicmds04.tu-graz.ac.at (Frank Kappe)
Date: Thu, 25 Jun 92 14:25:18 +0200
From: fkappe@fiicmds04.tu-graz.ac.at (Frank Kappe)
Message-id: <9206251225.AA25321@iicm.tu-graz.ac.at>
To: www-talk@nxoc01.cern.ch
Subject: HTML DTD and related problems (rather long)
>>>>> On Wed, 24 Jun 92 12:15:52 CDT, Dan Connolly <connolly@pixel.convex.com> said:

>>But then this raises another issue: does WWW allow anchors within
>>anchors?  I think not - in which case I could not use WWW anchors to
>>both label a paragraph (e.g. for attaching an annotation) and a word
>>within it (e.g. for definition).  This worries me quite a bit.  Nor
>>can I attach multiple links to the same point (e.g. definitions of a
>>word in multiple languages).
>
> This and other related questions (can I have lists within lists?)
> are precisely the reason for using a well-defined structural markup
> language governed by SGML processing rules.

> Right now we have no DTD for HTML, and the only answers lie in
> the browser source code. The documentation "in the web" is too
> vague. But I hardly think we want the browser source code to
> be the definition of HTML.

The situation with anchors in SGML is even worse. In addition to the cases where
you want to have anchors within lists, list items, headlines etc. and lists,
list items, headlines etc. within anchors, you also have anchors within anchors.
Actually, anchors within anchors look like this:

<BEGIN ANCHOR A>
!
!
!   <BEGIN ANCHOR B>
!   !
!   !
!   !
!   <END ANCHOR B>
!
!
<END ANCHOR A>

This case is trivial (you just allow anchors within anchors in the DTD).
However, consider the case where you want to have a destination anchor marking,
say, paragraphs 1 and 2, and another one marking 2 and 3:

<BEGIN ANCHOR A>
!
! para 1
!
!   <BEGIN ANCHOR B>
!                  !
! para 2           !
!                  !
<END ANCHOR A>     !
                   !
  para 3           !
                   !
    <END ANCHOR B>-+

This situation cannot be implemented with SGML Tags like <A ....>text</A>, as
it is proposed in HTML. Also, I doubt that it is possible to construct an anchor
spanning, e.g., a few items of list A and a few items of list B, because the
SGML parser would implicitly close openened anchor tags when reaching </list>:

<list A>
  <item>...
<A .....>
  <item>...
  <item>...
</list>

<list B>
  <item>...
</A>
  <item>...
  <item>...
</list>

The reason why it is possible to construct such things using the NeXT-based WWW
viewer/editor is simply that HTML is not SGML. Therefore it is impossible to
specify a DTD for HTML (as Dan has already pointed out).

In our Hyper-G system that uses HTF, a SGML-based format similar to HTML, we
overcome the anchor-nesting problem by specifying TWO tags for anchors: an
anchor-start (<AS>) tag and an anchor-end (<AE>) tag with an additional ID
attribute. So, the examples are coded like this:

<AS ID="A">
para 1
<AS ID="B">
para 2
<AE ID="A">
para 3
<AE ID="B">

and

<list A>
  <item>...
<AS ID="C">
  <item>...
  <item>...
</list>

<list B>
  <item>...
<AE ID="C">
  <item>...
  <item>...
</list>

which is perfectly legal in our DTD. I don't want to waste more internet
bandwidth sending the DTD, but you may get it by anonymous ftp from
iicm.tu-graz.ac.at in file pub/Hyper-G/sgml/hyper-g.dtd. There is also a
corresponding style sheet as well as styles used to convert HTF to HTML and
LateX with a stand-alone SGML parser. 

Let me say one final word about anchors: In my (and others) opinion, it is
generally not a good idea to store anchors (or even links) in documents. This
requires a modification of the document whenever an anchor is
inserted/modified/deleted and is problematic in multi-user environments with
private links, etc. Rather, the links should be stored and manipulated in a
seperate link database (like in Intermedia and also in Hyper-G). This also
allows for backwards-tracing of links, which is essential for maintaining the
integrity of the hypertext and providing a graphical overview of the hypertext
to the users.

However, in certain circumstances (like document modification) it is convenient
to supply the anchor information with the document. That is the reason why it's
in the DTD.

-----------------------------------------------------------------------------
Frank M. Kappe                                      fkappe@iicm.tu-graz.ac.at
Institute for Information Processing                     Fax: ++43 316 824394
Technical University Graz, Austria           "Sorry, no kangaroos in Austria"
-----------------------------------------------------------------------------