Re: ICADD elements in HTML

Yuri Rubinsky (yuri@sq.com)
Fri, 9 Sep 94 02:03:35 EDT

[Still Long]

I think that it may make sense to respond to some of the latest round
of Dan's and Terry's comment out of order.

Terry responding to Dan responding to Terry:

T>
T> | > so I say we might as well put them into 2.0 as wait for
T> | >2.1.
T> | I STRONGLY disagree. Show me three major browsers that support ICADD
T> | documents, and I'll support a change to HTML 2.0.
T>
T> Good argument.
T>

Bill Perry writes:
BP> I have been thinking of good ways to implement generic, user-specified
BP> content-type -> content-type filters in the emacs-w3 browser. I can add
BP> new tags for simple highlighting by changing one variable, so thats no
BP> problem, and adding the other tags wouldn't be too much of a problem
BP> either. I will try to do it as soon as it is in a written document I can
BP> sit down with (other than email).
BP>

I think we'll see at least two other browsers that support ICADD soon,
but I still think it's the stuff of either HTML 2.1 or 3.0. I'd
prefer the former just because I think it's important to get the
capability out there. But I agree that it shouldn't be in 2.0.

[taken out of order...]
D>
D> But is this the forward-looking way to approach this problem? Every
D> time we see a new body of legacy documents, we change HTML to
D> incorporate those idioms?
D>
D> Or do we promote the multi-format architecture of WWW? Do we encourage
D> folks that one day there will be native support for many DTDs --
D> perhaps DocBook soon, and perhaps some commercial web clients will
D> support arbitrary SGML documents?
D>

I agree with Dan 100% that there can and should and will be support
for arbitrary SGML on the Web. I think it will be a great boon to
serious business use of the Web.

So yes, we encourage multi-format architecture, use of DocBook, ATA,
automotive, Pinnacles, any or all of these.

But HTML is different than those, and , I would argue, so is ICADD.
I think it happens that these two are *not* rich in the semantics
of an industry, or an idiom, or any karass divided along subject
lines.

Rather they use SGML really as a way to formalize the specification
of display capabilities. Admittedly they do so by specifying a basic
set of logical structures, but in a sense that's only because it's
convenient to do so.

HTML will never include <TASK> or <SUBTASK> for airline maintenance
manuals; it will never include <PARTNO> or <ACCESSIBILITY-CODE> or
other content-driven or semantic-driven encoding. Neither will
ICADD. They're both concerned about the basic representation of
either simple or complex documents using simple constructs. The
fact that they overlap by >90% at least means that the constructs are
somewhat universal, which is the way it should be.

D>
D> Furthermore, I suspect that ICADD and HTML are not so similar in
D> practice as they are in theory. Each has its own domains of
D> application, and it's not clear to me that they are "drop-in"
D> replacements for each other.
D>

I disagree with this. After 3 years of ICADD-interest, I'm still
happy to report that head-levels, paras, lists, etc remain the
basic building blocks of on-line, braille, spoken and print text.

There is *no domain* for HTML or ICADD. They each deal with
geography, computer science, household science, literature, etc.
They are the opposite of the Pinnacles or aerospace or automotive
DTDs in that sense. They are a simple foundation, and, I would
argue, universally enough recognized that all of the complex DTDs
most likely can be *down-translated* into them. Into either of
them.

My position is that these two, so far, are unique, in a sense, in
this domain-free-ness...and, further, that we only need one. The
overlaps are so great, and the underlaps so insignificant, that
it's foolish to propagate both.

And it would be a spectacular bonus if people down-translating
into ICADD would have the added value of being able to post all
that material directly to the Web...the same raw files that they
can send to their Braille printer, or synthesized voice software.
No further transformation, no fuss. As simple as HTML is today.

D>
D> Proposal: Integrate into WWW support for documents marked up with the ICADD
D> DTD by specifying a new content type: text/icadd, and explaining
D> to browser implementors that it can be supported using nearly
D> the same code they use to support HTML. For example, we could
D> add a text/icadd->text/html conversion module to libwww as a
D> reference implementation.
D>

So that's why this seems too much complexity to me. Instead:

Proposal: a) Add support in HTML 2.1 or 3.0 for 3-5 new elements
in the base set, that is, for AUTHOR (AU), LIST HEADING (LH)
and SIDEBAR (BOX); for completeness, PAGE REFERENCE (PP) and INK
PRINT PAGE NUMBER (IPP) also.
b) Encourage browser-makers to allow aliases for those
elements whose ICADD names and HTML names are isomorphic.

D> Proposal: Integrate support for documents marked up with the HTML
D> DTD into ICADD applications by adding fixed attributes to
D> the HTML DTD.

End this proposal there. Dan goes on to say:

D> If there is sufficient experience with this technique
D> at the time the HTML 2.0 document gets published, these fixed
D> attributes can be part of the 2.0 HTML DTD. If these fixed attributes
D> are not completely stable by that time, it should wait for 2.1.
D>

The fixed attributes are stable. There is experience with the
technique. I think this part is a small job for which I'll need
some help from ICADD tech committee members.

D>
D> Perhaps in the near future, more ICADD applications will be
D> independent of the ICADD DTD, and merely dependent on the ICADD
D> architectural forms. Then information providers can mark-up their
D> stuff in HTML and have it consumed by both the Mosaic/lynx market and
D> the ICADD market.

While I mostly agree with this, it will always be the case that for
a large number of textbooks (for example), they will be created in
Quark Express and translated into ICADD. There will be no richer
tagset from which to work.

I would add to the last sentence "... can mark-up their stuff in
HTML or ICADD and have it consumed ..." Other considerations will
determine which of these it is -- *unless* ICADD turns out to simply
be a subset of HTML. If I was preparing a book for Braille production
from a non-SGML source, I'd probably go to ICADD since it could be
used in both environments without transformation, whereas if I went
for HTML, I'd have to re-translate to get to ICADD.

D> Meanwhile, apparently (I have no first-hand experience here) there are
D> LOTS of documents marked up with the ICADD DTD, and some govenment
D> contracts require this markup, and so there will be more documents
D> marked up this way in the future. Further, the ICADD DTD and the HTML
D> 2.0 DTD are nearly isomorphic. If you've got code to support HTML 2.0,
D> it's an hour's work to support ICADD DTD documents. So we want to
D> encourage browser implementors to support ICADD DTD documents.

Agreed. Although I still think they should think of themselves as
supporting HTML 2.1 [or whatever] plus a handful of ICADD aliases.
Nothing quite as formal as having to support an ICADD DTD. Nothing
to suggest they're having to support two DTDs.

D> One way to do this is to say "As of Jan 1, 1995, HTML includes all
D> the ICADD tags." Browser implementors scurry around between now and
D> then, and HTML includes ICADD.
D>
D> But is this the forward-looking way to approach this problem? Every
D> time we see a new body of legacy documents, we change HTML to
D> incorporate those idioms?

I don't think the issue is legacy documents. The issue is support for
constructs that are needed in textbooks (primarily) and which make
sense in Braille, synthesized voice and large print. I think having
a model for LIST -- (LISTHEADING?, LISTITEM*) simply makes sense,
irrespective of ICADD. AAP, ISO 12083, TEI, CALS, IBMIDDOC and some
of the DocBook list types *all* have optional ListHeadings. Everyone
one of those has some sort of construct for AUTHORNAME, irrespective
of their domain. ADDRESS doesn't have the useful granularity of
AUTHOR. SIDEBAR remains a thorny issue although I still believe
that it represents a semantic construct which is usefully added to
HTML "(but I can live with it being left out and aliased to NOTE).
INK PRINT PAGE is valuable although the name sounds like it comes from
Braille -- but the name could be changed to whatever implies that
what you're currently viewing is actually an online representation
of a printed page and here is the SourcePageNumber. And finally
PAGE-REFERENCE is valuable even if only in citations or references
to printed books. I can live without that, and simply have browsers
ignore that markup.

Let's weave Terry into the narrative:

T> Here's my view: HTML is basically a set of online presentation
T> semantics. ICADD could have been a set of abstract, nonvisual
T> presentation semantics, but it has followed Braille (am I right,
T> Yuri?), which, most interestingly (if I'm right) followed a print
T> model or metaphor or whatever you call it.

Given that these two differ primarily in that ICADD has a sidebar
and two elements which refer to pagenumbers , and HTML has a lot more
stuff in its HEADer info, I'd say that both followed a print
metaphor pretty closely. ICADD has ANCHOR which pretty much stands
for a print --or non-print -- representation of <A>. ICADD was also
heavily influenced by IBM Book Manager, an online delivery system
which happens to work real well with synth voice software/hardware.

T> It would probably be a Good Thing for HTML to rise to a somewhat
T> higher level of abstraction, in the process incorporating some
T> of the more abstract semantics that ICADD could have had. It
T> would then be a generalized set of presentation semantics,
T> and there would be a set of recommendations for VT100
T> and GUI rendering; other sets of recommendations would deal
T> with Braille, etc., rendering---all from the same HTML instance.

Yes.

T> Now if we just slot in the five ICADD elements, we are adopting the
T> print model without change, and we're not rising to a higher level
T> of abstraction, just folding in the print-specificity of Braille.

I think there is a high level in which we recognize that sometimes,
on-line, you'd like to be able to refer to the printed page in
which something occurred off-line. If eliminating PP and IPP would
make this high-level, and somehow therefore healthier, I can take
it up with the ICADD committee. The fact is, that even if the
markup associated with those two elements were ignored by the
current crop of browsers, ICADD documents (given the other three
and the aliasing) would work fine in WWW browsers.

In this latter case, we're basically talking about having something like:
<IPP>32</IPP>
<PARA>If you turn to the other textbook, <IT>Math Made E-Z</IT>,
to page <PP>125</PP>, you'll find a related discussion.

be treated in a browser *as if* if were marked up as (for example):
p32
<P>If you turn to the other textbook, <I>Math Made E-Z</I>,
to page 125, you'll find a related discussion.

My proposal is to have it treated as if it were:
<IPP>32</IPP>
<P>If you turn to the other textbook, <I>Math Made E-Z</I>,
to page <PP>125</PP>, you'll find a related discussion.

Since if PP and IPP were available, then, for example, a browser
might offer a user a "Go To Page" command (perhaps iff IPP appears
anywhere in the document). (Or not. People won't use it if they
don't have page numbers to deal with. They would if they do.

If people are concerned that browser-makers put in such a
command in a non-print environment, then I would say that's an
argument in favour of Dan's suggestion for text/icadd which
actually configures one item on a menu to say "Go to Page".

T> We would do better to ask what the nonprint semantics of these ICADD
T> elements are, and fold *those* in if need be. If we could cover
T> all the cases Yuri raises, on that abstract level, then ICADD
T> docs could run on appropriately modified WWW browsers, and even
T> get satisfactory visual rendering.

I've tried to do that briefly above.

T> But if HTML gains the abstract semantics needed to incorporate
T> the ICADD elements, then ICADD docs can be filtered to this
T> version of HTML trivially, and we need not bother with the
T> aliasing method.

Aliasing PARA for P etc seems much simpler to me than having to
run a filter on every file you open. (Or even -- after checking
the extension -- only doing so on the text/html documents.)

T> So Sidebar is perhaps a kind of block-linked-to-document rather
T> than block-linked-to-point, perhaps something else. The page reference
T> elements are tougher, but if the translation is from ICADD to HTML,
T> then they can be managed as A's with some generated text if need be.

I could live with that too. All I'm really hoping for is that blind
kids with on-line texts can be told to read pages 88 to 94 and have
a fighting chance of having free software that lets them find those
pages without scrolling through 87 other pages first. Or even knowing
what to click on in a table of contents that encompasses those pages.
The page numbers *will be* in the source files; why not use them?

D> HTML is a good thing. It solves some problems. It's a great foundation.
D> It will always be the common ground for the web.
D>
D> But let us not submit it to creeping featurism. HTML should always be
D> the _simplest_ markup language that will let folks around the globe
D> communicate using hypertext and to some extent, hypermedia.

I simply do not see that adding five elements which allow an additional
percentage of the population to embrace the Web is creeping
featurism. If I was asking for elements which only made sense to
geographers, or architects, or airline maintenance people, fine,
but I'm talking about "horizontal" elements which I believe will be
useful to *anyone*. I think academics and lawyers, just for two examples,
are always interested in what page things were on in print editions.
I think you'll soon find that page-turner viewers are up and running
on the Web and we'll want to link our HTML files to those.

D> Let's add text/icadd (or application/icadd, whatever) support to some
D> browsers, gain some experience, see if there are any user interface
D> issues, and then see if HTML needs any changes.

We've seen a vote from Bill Perry saying he'll implement the proposals.
I've mentioned the two other browsers that will. As far as I can tell, at least
the two I know about, are simply doing it the easy, merged way.
[Any software-makers that would care to comment on how they're thinking
of doing this, please do so.]

I think we'll be able to sit back and see what happens.

If that doesn't work, I suggest we go to the text/icadd approach.

D> I'm all for integrating ICADD into the web and making/keeping the
D> web accessible to the greatest number of people possible.
D>
D> But I don't support "merging" ICADD into HTML.

I guess this is the crux of my disagreement. It just doesn't seem
like that big a deal to me, but I'll bow to Dan's much greater
involvement with HTML than mine. And I'm concerned that asking
that there be a text/icadd means that some browsers will support
it, some won't, all the negotiation stuff has to happen, and so forth.
If these new elements are simply part of the DTD, and people do the
one hour of aliasing work, that goes away. The only negotiation is
the negotiation that would go on regarding which version/level of
HTML one supports.

D> If there are folks converting ICADD documents to HTML and they find
D> that they are making some terrible kludge that can be eliminated
D> by changing HTML just a little, then I'm interested to see specific
D> examples of this, and I'm open to suggestions on how to change HTML
D> to accomodate this application. That's what I meant by "making changes
D> to HTML to make it isomorphic to ICADD."

D> I forgot the most salient technical point:
D>
D> In message <9409081911.AA00246@ulua.hal.com>, "Daniel W. Connolly" writes:
D> >
D> >One way to do this is to say "As of Jan 1, 1995, HTML includes all
D> >the ICADD tags." Browser implementors scurry around between now and
D> >then, and HTML includes ICADD.
D> >
D>
D> Given this scenario, imagine January 2:
D>
D> 10,000 ICADD documents hit the web, under the guise of HTML 2.1 or
D> some such.
D>
D> A zillion old browser still exist.
D>
D> A zillion users see garbage because the servers are sending these
D> new ICADDD HTML documents to old browsers.
D>
D> So in fact, there will have to be some format negociation distinction
D> between HTML-with-ICADD and HTML-without-ICADD.
D>
D> So we see that browsers must announce support for the ICADD tags.
D> The only question is whether they say:
D>
D> Accept: text/html; level=2.5
D>
D> or
D>
D> Accetp: text/html, text/icadd
D>
D> I suggest the latter.

I'd vote for the former, but would be happy with either. I don't think
we'll add the ICADD elements all by themselves, but likely make them
at the same time we do something else. All the negotation would
exist anyway; ICADD doesn't make that process any more difficult.

D> In message <199409081944.MAA05255@rock>, Terry Allen writes:
D> >There seems to be consensus that adding the elements is the way
D> >to go;
D>
D> I disagree. So much for consensus!
D>
D> > so I say we might as well put them into 2.0 as wait for
D> >2.1.
D>
D> I STRONGLY disagree. Show me three major browsers that support ICADD
D> documents, and I'll support a change to HTML 2.0.

My guess/hope is that you will see this soon...in time for 2.1 anyway.

Dan quoting Terry:
D> > The documentation should make clear what their intended
D> >use is, and warn ordinary HTML users that their rendering may
D> >differ considerably from one WWW browser to another.

No more nor less than any other semantics that we document and
prescribe.

T> Either way (running two DTDs or only one), it would be interesting
T> to get more abstraction by encompassing the abstract semantics
T> of the ICADD elements; it would not be interesting to just slot
T> them in as is.

Fine. I've tried to suggest above what the slighty-more-abstract
notions are. I can live with any variation of those that will
adequately represent the five types of source elements when
presented. And by "adequately", I can live with losing IPP and PP if
absolutely necessary, aliasing SIDEBAR to NOTE, allowing AUTHOR
to be aliased to ADDRESS if need be!, and getting guidance from
the rest of you as to how we display a LISTHEADING when we're only
allowing ITEMs. (Not that I'm advocating these compromise positions!)

T> But if that's a reasonable way forward, despite my agreement
T> with Dan's arguments I could go with it, on the assumption that
T> we're going to clean things up later on.

I don't think we'll have to clean up later. Through the course of
this discussion I've become more and more convinced that the changes
I'm asking for are minimal. And the additional work for browser-makers
negligible.

And the value immeasurable.

Yuri