Re: New Topic: HTML and the Visually Impaired [long]

Terry Allen (terry@ora.com)
Thu, 1 Sep 94 18:36:10 EDT

| T> Yuri, you had a little scripting language or something of the sort
| T> for use with ICADD attributes, I believe in order to indicate
| T> treatment of elements-in-context. Have you decided you don't
| T> need it for this purpose?
|
| The "little language" for transformation still exists.

But you didn't answer the question.

| | The ICADD tagset is described and "formalized" in an annex to ISO 12083.
| T> Is this new?
|
| No. This is all part of ISO 12083 and has been since it was published
| last year. The ISO DTDs include all the SDA attributes as a model.

That is, the AAP DTDs. I'm now confused over why we can't go ahead
and dress up Docbook in the same manner.

| They are isomorphic except for some minor details. I think if we
| can simply add the new handful of elements, and *NOT* add the
| isomorphic names to HTML but rather simply instruct browser creators
| to build the aliases if they want to support ICADD, it'll make HTML
| a little simpler, but accomplish the same purposes.

Yes, but in the longer run (2.1, 3.0) we want the fixed atts in the DTD.
I see by what follows that I have misunderstood your proposal. You
seem to be suggesting that instances marked up to conform to
-//EC-USA-CDA/ICADD//DTD ICADD22//EN
can be made renderable in, e.g., Braille, while still being viewable
in Mosaic, by aliasing the 22 ICADD elements to HTML elements.

I thought you were proposing that the fixed atts be added to the HTML
DTD (one of them, eventually) to make *HTML* renderable in Braille.
This is surely something we want to do anyway.

In other words, you are trying to get WWW browsers to run 2 DTDs,
taking advantage of the similar structures they describe. Is that
right?

| D> From an architectural point of view, it makes more sense to me to just
| D> define a new content type: text/icadd, and enhance browsers to support
| D> that type. One handy way to implement this is to parse the ICADD
| D> document, translate it to HTML internally, and render the HTML.
| I don't see the difference really. ICADD files would, presumably
| have a different suffix, but for browser creators it'd be much
| the same. I'm happy either way. Since there are so few additional
| elements, it seemed to me more "harmonized" to make the minor
| enhancements to HTML. Whichever.

If the whole point is that the ICADD atts allow the same instance
to be interpreted for visual rendering AND for Braille, etc, then
you can't determine when you send the instance out what the browser
on the other end will do with it. So I don't understand why one
wants a content type text/icadd; text/html will do, or, if
instances conforming to the ICADD DTD are to be handled,
text/sgml. There are far too many DTDs to have a content type
for each one.

| | 2) When HTML2.0 is finalized, I'll add the ICADD attributes to it
| | in a version that we distribute to content providers who work with
| | the blind (Braille translation houses, electronic book creators, etc).
| | This will establish the mappings from HTML elements to ICADD
| | elements and will mean that all HTML text is *instantly* and
| | automatically translatable into both print and on-line Braille.
| T> I think these attributes should be part of the regular DTD, not
| T> just in some other version. I'm beginning to wonder about how
| T> many sets of fixed atts a DTD can handle, but this is about the
| T> most important set I can think of.
|
| D> Great. This makes perfect sense. Let's do take care that HTML can be
| D> consumed this way.
|
| My sense is that because of the isomorphism and because HTML
| doesn't really expect there to be text generated on the fly, this
| should be a simple case of SDAFORM mappings. DOCBOOK, for example,
| because it supposes a great deal of sophisticated processing,
| including elements in context handling, needs all the complexity
| of all the attributes.

So we need the little language for Docbook but not for HTML. Does
that mean that AAP is between HTML and Docbook in complexity?

| >HTMLPLUS BOOK May be valuable to keep this distinction
| > so browsers will know it's the ICADD DTD
| D> Ah... apparently so.
| T> could you explain that?
|
| Of course, if they are handled the same way, this won't matter.

If you want browsers to grasp that an instance is marked up
according to the ICADD DTD and not HTML, then you'd better
change the base ICADD element to a more distinctive name
or require a DOCTYPE declaration. "BOOK" is just too
common. I'll also reiterate my call
for requiring the HTML or HTMLPLUS start tag (in 3.0, of course).

| The Mosaic people are considering some special handling for
| files they know are ICADD -- fancy stuff with IPP and PP -- so

If they can handle ICADD and have a copy of the HTML DTD that
includes the ICADD fixed atts, then *ANY* HTML instance is
an "ICADD" instance, right (though not an ICADD DTD instance)?

| it occurred to me it might be useful for browsers to know
| which tagset.
| If we follow Dan's suggestion of a text/icadd type, this goes
| away, but then matters less anyway.

| | Software could ignore the following elements or do special processing:
| | IPP Would be best if browser makers turned
| | this into generated text such as
| | "Print Page: ". These are used both
| | to alert blind person to matching
| | page in printed book and also as
| | targets of <PP>. Content could be
| | turned from <IPP>154</IPP> into
| | <IPP NAME="154"> or equivalent.
| | PP Reference to <IPP>. Could be treated
| | as <A HREF="154"> or equivalent.
|
| T> Would you give an example of usage?
|
| For example:
| <P>As you will see in figure 12 on page <PP>154</PP>, the quick brown
| fox has .....
|
| Elsewhere, a formatter -- or a human transcriber -- has ensured that
| the braille pages include a <IPP>154</IPP> wherever the printed
| copy of the book breaks that page. The braille formatter will place
| the IPP page numbers flush right (I believe) so that even though
| a typical print page might take up three braille pages, a blind
| reader can still find the page number that the rest of the class
| is looking at.
|
| This is the standard usage of PP from the AAP doctypes.

Isn't the printed page a "concept not in HTML", so we can ignore
this for HTML? If ICADD DTD elements need to be mapped in this
fashion to take care of ICADD DTD instances, maybe HTML fixed atts
should go in the ICADD DTD.

| | Concepts not in HTML which would be added for ICADD support:
| | BOX Could simply be <HR> at <BOX>
| | and another <HR> at </BOX>; browser
| | developers could draw vertical lines.
|
| T> what is this supposed to be? a sidebar? I'd rather leave it out
| T> than suggest a new structure for specifically online
| T> presentation that would be problematic to render.
|
| This is a sidebar. Remember that most ICADD usage is for textbooks
| which, in the modern style, are sidebar-rich. I don't think we
| can actually leave it out is we want to support ICADD files. That
| is, we have to do something with a file that has a SIDEBAR in it,
| rather than just format it as a paragraph. HTML is pretty specific
| about online presentation already. I'm not convinced that this
| <HR> approach is so out of keeping.

This is where I balk. If BOX is to be aliased to something, what's
the corresponding fixed att to go in the HTML DTD? There wouldn't
be one. Then we have different aliasing and SDAFORM rules,
have to maintain two sets of doc for browser developers, and so on.

| | I would further propose that ICADD lose the LANG element and replace it
| | with the CHARSET attribute as is done with the HTMLplus proposal.
| T> Is this sufficient? In our Docbook discussion we decided that
| T> charset, lang, and locale were all needed. We haven't implemented
| T> anything yet, hoping that SGML Open will suggest a standard
| T> approach.
| D> This is an extremely hairy ball of wax. I'd love to see a complete
| D> proposal for multi-lingual documents.
| I agree with both. Getting the very incomplete lang attribute
| out of ICADD will lsimply be a way of admitting the problem is big.

Well, no, it will be an incomplete solution that will invite
incomplete and incompatible patches, which will then be claimed
as "existing practice." Let's get this right. Has SGML Open made
any progress on this matter? (Of course if this is just a mod to
the ICADD DTD, maybe I don't care if the ICADD doesn't.)

| D> By the way... is there a large body of legacy ICADD documents? I have
. . .
| D> If there is no legacy of ICADD documents, I suggest the ICADD folks
| D> just adopt HTML wholesale, and lobby for the few changes they need
| D> (while supporting them in their own applications, of course). There's

| Unfortunately, there is a legacy both of documents and of software
| that processes them. Also the fact that ICADD is part of the ISO
| standard has meant that it's part of other specifications now, mostly
| for US State Education authorities who have been requiring textbook
| publishers to supply ICADD-encoded files.

Yes, but the fixed attribute approach can make an instance conforming
to any DTD "ICADD-encoded." I agree with Erik Naggum that a DTD
should not be an ISO standard; are you suggesting that on this bogus
ground publishers are required to produce AAP DTD instances simply
because those DTDs have the fixed attributes? Or are they
required to produce instances conforming to the ICADD DTD?
If so, that would be an argument for putting HTML fixed attributes in the
AAP DTD, wouldn't it?

| In a sense, my proposal is exactly what you're suggesting: I'm
| lobbying for the specific changes they need. If this group is really
| opposed to the aliasing idea, I can go back to the ICADD committee
| and ask them to change the names to match HTML, and we can go back to
| ISO with a request for an amendment to 10283.

I suppose I don't know how to respond to this until I understand
better what DTDs you want to support: HTML and ICADD? or AAP too?

Regards,

-- 
Terry Allen  (terry@ora.com)   Editor, Digital Media Group
O'Reilly & Associates, Inc.    Sebastopol, Calif., 95472