Re: New Topic: HTML and the Visually Impaired [long]

Yuri Rubinsky (yuri@sq.com)
Mon, 5 Sep 94 00:49:36 EDT

In reply to Terry's responses to my replies to his and Dan's comments
and questions:

> | T> Yuri, you had a little scripting language or something of the sort
> | T> for use with ICADD attributes, I believe in order to indicate
> | T> treatment of elements-in-context. Have you decided you don't
> | T> need it for this purpose?
> |
> | The "little language" for transformation still exists.

> But you didn't answer the question.

Right you are. Good point. Yes, I think it's not needed for this purpose.
The mapping is so straightforward that it can almost all be done with the
simple "SDAFORM" attribute which is basic one-to-one transformation.
I may discover otherwise when I get to it, but for now I expect that
only ID and IDREF attributes will need something fancy.

> | | The ICADD tagset is described and "formalized" in an annex to ISO 12083.
> | T> Is this new?
> |
> | No. This is all part of ISO 12083 and has been since it was published
> | last year. The ISO DTDs include all the SDA attributes as a model.

> That is, the AAP DTDs. I'm now confused over why we can't go ahead
> and dress up Docbook in the same manner.

We can, absolutely. But Docbook is considerably more complex than
HTML. It would need all the SDA capabilities, and needs, probably, to
be done with involvement of experts both in Docbook and in Braille.
The basic question that needs to be asked over and over is: How is
element XXX best represented in Braille, and therefore, to what does
it map (in what contexts, with what attributes, with which of the
possible transformations) in the ICADD tagset. When we met at
SGML '93 (or what is '92?) with a group of such people in a room
together, I had hoped we'd be able to do that work, and then
realised how much harder that work is than I'd thought. (The 12083
dtds, which I added the SDA attribs to, with assistance from people
in the Braille community, turns out to be a lot more straightforward
than Docbook!)

[Just for the record: These are much improved over the original
AAP DTDs and shouldn't be thought of as the same. The AAP work
was ground-breaking etc, but, naturally, suffered from being the
first public, committee-built, industry-based set of DTDs.]

> | They are isomorphic except for some minor details. I think if we
> | can simply add the new handful of elements, and *NOT* add the
> | isomorphic names to HTML but rather simply instruct browser creators
> | to build the aliases if they want to support ICADD, it'll make HTML
> | a little simpler, but accomplish the same purposes.

> Yes, but in the longer run (2.1, 3.0) we want the fixed atts in the DTD.
> I see by what follows that I have misunderstood your proposal. You
> seem to be suggesting that instances marked up to conform to
> -//EC-USA-CDA/ICADD//DTD ICADD22//EN
> can be made renderable in, e.g., Braille, while still being viewable
> in Mosaic, by aliasing the 22 ICADD elements to HTML elements.

> I thought you were proposing that the fixed atts be added to the HTML
> DTD (one of them, eventually) to make *HTML* renderable in Braille.
> This is surely something we want to do anyway.

> In other words, you are trying to get WWW browsers to run 2 DTDs,
> taking advantage of the similar structures they describe. Is that
> right?

Yes. Sorry I didn't make that clearer.

There are two sides to this proposal.

1) The easy side (since I can do that by myself): Adding the FIXED
SGML Disabled Access attributes to HTML at the right moment. This
allows *all* HTML to go to Braille readers and printers. That's one
direction, and a very important one, and inexpensive at the price
for the great opening-up of availability that it creates.

2) The other side is taking the ICADD documents and making them
accessible through free viewers. In effect, since ICADD-tagged files
can be created from *any DTD with the fixed attributes* this would
allow any documents conforming to such DTDs to be rendered using
WWW browsers without having to convert them *both* into HTML and
ICADD. (The latter is what UCLA now does with its campus-wide
information service.) Many books, particularly textbooks,
need to be transformed into the ICADD tagset in order to easily
be printed in Braille or fed into synthesized voice readers (such
as IBM's Book Manager which does a great job for visually impaired
people). Accordingly, since that text exists in that form, it seems
to me to make sense to be able to distribute those files in
electronic form for use with free browsers. (One serious problem
for the disabled is the outrageous cost of acquiring technologies
that help.)

This second side of the coin is the one that nearly the whole
posting was about. Those of you who came at this proposal with
background in the ICADD approach were probably at a disadvantage
since this seemed to be close but just a little off. I hope this
clears that up. We're talking here about work browser creators
would do to support a handful of ICADD elements, a few added to
HTML, and the larger number aliased. (A handful are identical as
well.)

> | D> From an architectural point of view, it makes more sense to me to just
> | D> define a new content type: text/icadd, and enhance browsers to support
> | D> that type. One handy way to implement this is to parse the ICADD
> | D> document, translate it to HTML internally, and render the HTML.
> | I don't see the difference really. ICADD files would, presumably
> | have a different suffix, but for browser creators it'd be much
> | the same. I'm happy either way. Since there are so few additional
> | elements, it seemed to me more "harmonized" to make the minor
> | enhancements to HTML. Whichever.

> If the whole point is that the ICADD atts allow the same instance
> to be interpreted for visual rendering AND for Braille, etc, then
> you can't determine when you send the instance out what the browser
> on the other end will do with it. So I don't understand why one
> wants a content type text/icadd; text/html will do, or, if
> instances conforming to the ICADD DTD are to be handled,
> text/sgml. There are far too many DTDs to have a content type
> for each one.

I agree. That's really why I'm pushing for the simple changes to
HTML that will allow it to encompass any ICADD file as if it were
an HTML file. The browser -- and the user -- really won't know the
difference.

[...]

> | My sense is that because of the isomorphism and because HTML
> | doesn't really expect there to be text generated on the fly, this
> | should be a simple case of SDAFORM mappings. DOCBOOK, for example,
> | because it supposes a great deal of sophisticated processing,
> | including elements in context handling, needs all the complexity
> | of all the attributes.

> So we need the little language for Docbook but not for HTML. Does
> that mean that AAP is between HTML and Docbook in complexity?

Yes, exactly. See above.

> | >HTMLPLUS BOOK May be valuable to keep this distinction
> | > so browsers will know it's the ICADD DTD

> If you want browsers to grasp that an instance is marked up
> according to the ICADD DTD and not HTML, then you'd better
> change the base ICADD element to a more distinctive name
> or require a DOCTYPE declaration. "BOOK" is just too
> common. I'll also reiterate my call
> for requiring the HTML or HTMLPLUS start tag (in 3.0, of course).

Well, I'm now convinced that if we alias BOOK to HTML in order
to handle ICADD docs, then browsers really won't need to know
that this is ICADD. If we make any special handling that people
might want to take advantage of (such as turning IPP and PP into
linked references) then this can just as readily be available to
anyone, not just an ICADD-creator.

> | The Mosaic people are considering some special handling for
> | files they know are ICADD -- fancy stuff with IPP and PP -- so

> If they can handle ICADD and have a copy of the HTML DTD that
> includes the ICADD fixed atts, then *ANY* HTML instance is
> an "ICADD" instance, right (though not an ICADD DTD instance)?

Correct. That means *immediate* accessibility to HTML documents
for people who want to print them out in Braille. ("Immediate" in
this case means having Braille software anyway. The two most
popular North American Braille software packages both support
the ICADD tagset.)

[...]
> |
> | Elsewhere, a formatter -- or a human transcriber -- has ensured that
> | the braille pages include a <IPP>154</IPP> wherever the printed
> | copy of the book breaks that page. The braille formatter will place
> | the IPP page numbers flush right (I believe) so that even though
> | a typical print page might take up three braille pages, a blind
> | reader can still find the page number that the rest of the class
> | is looking at.
> |
> | This is the standard usage of PP from the AAP doctypes.

> Isn't the printed page a "concept not in HTML", so we can ignore
> this for HTML? If ICADD DTD elements need to be mapped in this
> fashion to take care of ICADD DTD instances, maybe HTML fixed atts
> should go in the ICADD DTD.

There's an interesting idea in the last sentence, and of course it
would work. The reason I didn't suggest that is that the overlap
between the two DTDs was so great that it seemed overkill to require
an SGML parser/transformer just to open any ICADD file in an HTML
browser.

The fact that the printed page is a "concept not in HTML" doesn't
help is here: It *is* a concept in Braille, and turns out, in that
circumstance, to be a very useful one.

> | | Concepts not in HTML which would be added for ICADD support:
> | | BOX Could simply be <HR> at <BOX>
> | | and another <HR> at </BOX>; browser
> | | developers could draw vertical lines.
> |
> | T> what is this supposed to be? a sidebar? I'd rather leave it out
> | T> than suggest a new structure for specifically online
> | T> presentation that would be problematic to render.
> |
> | This is a sidebar. Remember that most ICADD usage is for textbooks
> | which, in the modern style, are sidebar-rich. I don't think we
> | can actually leave it out if we want to support ICADD files. That
> | is, we have to do something with a file that has a SIDEBAR in it,
> | rather than just format it as a paragraph. HTML is pretty specific
> | about online presentation already. I'm not convinced that this
> | <HR> approach is so out of keeping.

> This is where I balk. If BOX is to be aliased to something, what's
> the corresponding fixed att to go in the HTML DTD? There wouldn't
> be one. Then we have different aliasing and SDAFORM rules,
> have to maintain two sets of doc for browser developers, and so on.

No. I'm suggesting that BOX appear in both DTDs. No aliasing, no muss,
no fuss. If any other DTD wants to have FIXED attributes which get
aliased to ICADD and take advantage of the fact that there is a kind
of architectural form for a sidebar construct, then it can.

If there's nothing in the HTML DTD that matches the concept, then it
doesn't get used, that's all. But if it is, then it lets HTML authors
use the sidebar concept (since parallel text of this sort does
exist and is useful); it lets browser-makers implement, no doubt in
interesting ways, such a construct; and, when an ICADD file comes
along that uses BOX, it lets that be displayed.

> | | I would further propose that ICADD lose the LANG element and replace it
> | | with the CHARSET attribute as is done with the HTMLplus proposal.
> | T> Is this sufficient? In our Docbook discussion we decided that
> | T> charset, lang, and locale were all needed. We haven't implemented
> | T> anything yet, hoping that SGML Open will suggest a standard
> | T> approach.
> | D> This is an extremely hairy ball of wax. I'd love to see a complete
> | D> proposal for multi-lingual documents.
> | I agree with both. Getting the very incomplete lang attribute
> | out of ICADD will lsimply be a way of admitting the problem is big.

> Well, no, it will be an incomplete solution that will invite
> incomplete and incompatible patches, which will then be claimed
> as "existing practice." Let's get this right. Has SGML Open made
> any progress on this matter? (Of course if this is just a mod to
> the ICADD DTD, maybe I don't care if the ICADD doesn't.)

I only meant to suggest to ICADD that the simple solution it
now employs is not enough in the long run. That's where the mod
should be.

SGML Open has a character set working committee, chaired by Wayne
Wohler. Until more people start to get active with that group, I'm
not sure much progress will be made. It's had a few meetings and
certainly has a good understanding of the problems and issues.
Send mail to Wayne [wohler@vnet.ibm.com] if you're interested in
taking part.

[...]

> | Unfortunately, there is a legacy both of documents and of software
> | that processes them. Also the fact that ICADD is part of the ISO
> | standard has meant that it's part of other specifications now, mostly
> | for US State Education authorities who have been requiring textbook
> | publishers to supply ICADD-encoded files.

> Yes, but the fixed attribute approach can make an instance conforming
> to any DTD "ICADD-encoded." I agree with Erik Naggum that a DTD
> should not be an ISO standard; are you suggesting that on this bogus
> ground publishers are required to produce AAP DTD instances simply
> because those DTDs have the fixed attributes? Or are they
> required to produce instances conforming to the ICADD DTD?
> If so, that would be an argument for putting HTML fixed attributes in the
> AAP DTD, wouldn't it?

The State of Texas has established that textbook publishers must supply
texts (by a certain date) only in SGML. They prefer the AAP Book DTD
since it was designed by book publishers for trade books -- not because
it has the fixed attributes. Alternatively, a publisher can use *any*
DTD, insert the fixed attributes, and deliver files with ICADD markup.
And yes, it is an argument for putting the SDA attribs into the AAP
DTDs, although, since 12083 is supposed to replace the AAP set, it's
kind of backwards. Although, on the other hand, the Texas law allows
for AAP conformance for textbooks.

> | In a sense, my proposal is exactly what you're suggesting: I'm
> | lobbying for the specific changes they need. If this group is really
> | opposed to the aliasing idea, I can go back to the ICADD committee
> | and ask them to change the names to match HTML, and we can go back to
> | ISO with a request for an amendment to 10283.

> I suppose I don't know how to respond to this until I understand
> better what DTDs you want to support: HTML and ICADD? or AAP too?

No, no support for AAP or 12083 or Docbook or anything else. Those
names are mentioned in this posting only by way of examples or
digressions or whatever.

All I'm hoping will happen is for HTML to include the handful of ICADD
elements that don't map directly to existing HTML elements; for
browser-makers to alias the handful of isomorphic elements; and for
agreement from everyone (and comments thus far suggest that this *is*
agreed) that we can publish HTML 2.0 with the SDA attributes built in.
(That's work I'll do as whatever moment seems right.)

Thanks to all for your abiding interest in this subject.

Yuri Rubinsky