Re: Hierarchy support in HTML [Was: Tables: what can go in a cell]

Daniel W. Connolly (connolly@hal.com)
Fri, 17 Feb 95 18:42:24 EST

In message <01HN5X607H2U9371GW@oax-2mr.mr.ornl.gov>, James D Mason writes:

A very nice treatment on hierarchy in document management, which
contains many points that I agree with, but very few which relate
to the business of this working group.

The business of this working group is to hammer out HTML standards,
and not to solve (or discuss) the problem of document management
on the web.

On the other hand, it's nice to have this disussion of the issues
in the archive. We can all point to it and say "no, we're not
going to solve all those issues, but we're aware of them."

Going back to where this all started, I'd like to reiterate the
goals of the HTML specification as I see them:

---------------------------------------------
From: "Toward Closure on HTML"
$Id: html-direction.html,v 1.3 1994/04/07 00:56:59 connolly Exp $
http://www.hal.com/%7Econnolly/drafts/html-direction.html

The Purpose of HTML

If we take a step back and look at the purpose and requirements and such
for HTML, I'd say the purpose of HTML is: to promote computer
mediated communication between parties on the internet by
representing information in terms of available hypermedia
technology.

The idea is that I use the tools available on my box to capture my ideas at a
fairly high level, so that you can use the tools on your box to
filter/navigate/display the ideas. And even though your tools and my tools
are not exactly the same, there's a high degree of confidence that the ideas
get through in-tact.

So to me, the idea of deploying specialized HTML editors on all the various
platforms makes HTML no better than RTF or PostScript -- the data is
tied to the supporting code. This is not to discourage the development of
specialized HTML tools, but to encourage interoperability between
existing tools (MS Word, FrameMaker, emacs...) and HTML applications,
and to discourage "creeping featurism" in HTML.

The Goals of an HTML Specification

The goal of any HTML specification should be to promote that confidence
in the fidelity of communications using HTML. This means:

1.making it clear to authors what idioms are available to express their
ideas
2.making it clear to implementors how to interpret the HTML format
so that authors' ideas will be represented faithfully
3.keeping HTML simple enough that it can be implemented using
readily available technology and processed interactively
4.making HTML expressive enough that it can represent a useful
majority of the contemporary communications idioms in this
community
5.making some allowance for expressing idioms not captured by the
specification
6.addressing relavent interoperability issues with other applications
and technologies
---------------------------------------------

At the time, I was very much against the idea of "design by committee"
where we would invent techniques that are not in common practice. We
would, rather, standardize the common practices for the purpose of
interoperability. I wanted HTML to be sort of "stuff you're doing
anyway."

By now, HTML and the Web represent a force of their own. On the whole,
whatever the web community demands, the vendor community will scurry
to support. Hence it's important that the web community not only
demand the equivalent of People Magazine and Home Shopping Channel on
the web, but that we strive for access for the disabled, longevity of
information, etc.

For example, with character sets, I figured the web would use whatever
technology (encodings and libraries to support them) that the other
applications use, because no vendor would spend much resource to
support bleeding-edge technology for such a relatively unpopular
feature. I now believe that the web has enough momentum to actually
drive the vendors to get together and deploy new levels of support
for multi-lingual interoperability.

Let's see where we are relative to these goals:

1.making it clear to authors what idioms are available to express their
ideas

Basically done. But the authors are all saying "you mean I can't
do tables?" and I'm tired of saying "No, not for 2.0."

2.making it clear to implementors how to interpret the HTML format
so that authors' ideas will be represented faithfully

Getting there. I think the problem has changed from educating the
development community to upgrading all the old, broken installations.

3.keeping HTML simple enough that it can be implemented using
readily available technology and processed interactively

Done, at least for the 2.0 idioms. Witness mosaic, netscape, lynx, chimera...

4.making HTML expressive enough that it can represent a useful
majority of the contemporary communications idioms in this
community

About 80% done. There are some critical idioms not captured by HTML
2.0: tables, for one. My observations show that there are idioms
widespread use that don't have natural expression in the language, <li
src="..."> and <hr src="..."> are evidently needed. Perhaps <cite
href=">...</cite> as well.

Note the word "contemporary," which means this set of idioms changes
over time. That leads us to...

5.making some allowance for expressing idioms not captured by the
specification

Big problem here. The spec says "ignore what you don't recognize"
but only informally. We need a formal description of this technique,
and an explicit extension mechanism.

6.addressing relavent interoperability issues with other applications
and technologies

Getting there.

>Dan asked for a proposed solution.
[...]
> So rather than suggest that we scrap the current nonhierarchical
>application and shove something like the my complex pet DTD in its place, I
>suggest that we need to move in the direction of supporting multiple DTDs.

Yeah verily. See:

"More complete SGML support on the Web"
http://www.hal.com/%7Econnolly/drafts/web-research.html#sgml

for my notes on this issue.

In my WWW/MIF/SGML/Davenport/HTML/MIME travels, I have had occasion to
write lots of DTDs, trying to come up with some sort of integrated
archtecture for sharing information. My experiments include:

connolly@austin5 {** NONE **}../develop[536] find . -name '*.dtd' -print
./web/html-spec/html3.dtd
./web/html-spec/html-0.dtd
./web/html-spec/html.dtd
./web/html-spec/html-0s.dtd
./web/html-spec/html-netscape.dtd
./web/html-spec/html-1s.dtd
./web/html-spec/html-s.dtd
./web/html-spec/html-1.dtd
./web/html-spec/html-mcom.dtd
./web/url_test/bibtex/bibtex.dtd
./web/libHTML/ideas/html2.dtd
./web/pywww/plaintext.dtd
./web/html-test/rock/work/dmg/dtds/html/orahtml.dtd
./web/html-test/forms/htmlplus.dtd
./web/html-test/orahtml.dtd
./web/html-test/mcom/html-mcom.dtd
./web/html-test/htmlplus/html3.dtd
./web/technologies/lincks/lincks.dtd
./web/technologies/lincks/ref-struct.dtd
./doc/authoring/HaL/OLIAS/Misc/rfc.dtd
./doc/authoring/HaL/OLIAS/Misc/latex.dtd
./python/Extensions/wwwlib/WWW/dtdtools/MIF.dtd
./python/Extensions/wwwlib/WWW/dtdtools/ideas/webnode.dtd
./python/Extensions/wwwlib/WWW/dtdtools/rfc822.dtd
./python/Extensions/wwwlib/WWW/dtdtools/setext.dtd
./python/Extensions/wwwlib/WWW/dtdtools/mime.dtd
./python/Extensions/wwwlib/WWW/dtdtools/qwertz.dtd

> Furthermore, I think that we need to consider two
>parallel approaches to multiple DTDs. One path is for there to be shared
>DTDs, developed and agreed to in public like the current DTD. The other
>path is for support for userdefined DTDs (DTDs specific to a particular
>application, to a particular user community, etc.)
> On the first path, we should try to do "most things for most people".
>We should maintain a simple, nonhierarchical DTD, perhaps a slightly
>expanded version of the present one, as a frozen beginner's tool.
> Rather than keep packing more features into the simple DTD until it
>becomes unusable and unmaintainable, we should also provide at least one
>hierarchical DTD with a set of features analogous to the kinds of objects
>typically packaged in the sample applications that come with word
>processors (sectioning, lists, paragraphs, special functions like address
>blocks and preformatted blocks). We should make the elements as structural
>as possible (make what we need to provide in the direction of formatting
>through attributes). Hierarchy should be loose (e.g., jumping from a
>section at level n to one at n+2 should be prohibited, but there should not
>be a requirement that there be at least two level-n+1 sections in a level-n
>section). Perhaps we should add to it simple tables and sub scripts and
>superscripts (but not built-up equations). I believe that this extension
>should be hierarchical because of the benefits the increased level of
>structure can bring. Thus we can still have the ease of use of HTML as we
>know it plus an extension for the many people who need something more, but
>not a full-blown publishing system.

I think this is a good idea. In fact I'd characterize the DTDs
differently: one is a specification of what browsers must support.
It's got almost no structure (lots of ANY content models and such.)

The other is for automated authoring systems. Much more structured.

The idea is that if your document conforms to the first one, it'll
work on pretty much any browser, but you might not be able to import
it into a structured editor.

The structured DTD should be modular, too: a common set of
"ur-elements" like paragraphs and lists and such, on top of which DTDs
for "message," "report," "article," "home page," etc. can be built.
The purpose of these structured DTDs would be to increase the
reliability of automation in indexing and resource discovery, and
to generally promote "knowledge domain interoperability,"[1] as
Engelbart put it.

Dan

[1] excerpt from
"Knowledge-Domain Interoperability and an Open Hyperdocument System"
by Douglas Engelbart.
http://www.hal.com/%7Econnolly/technologies/ioh.html