Re: New Topic: HTML and the Visually Impaired [long]

yuri@sq.com (Yuri Rubinsky)
Date: Thu, 1 Sep 94 17:12:47 EDT
Message-id: <m0qgJJE-000ESAC@sq.com>
Reply-To: yuri@sq.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: yuri@sq.com (Yuri Rubinsky)
To: Multiple recipients of list <html-wg@oclc.org>
Subject: Re: New Topic: HTML and the Visually Impaired [long]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)



I'm going to attempt to respond to both Terry and Dan in this
mail piece.


>The technical committee of the International Committee for Accessible
>Document Design, chaired by George Kerscher of Recording for the Blind
>and of which I'm a member, has come up with a technique of using #FIXED
>attributes in any DTD in order to map arbitrary elements to a fixed
>ICADD set. The ICADD set represents the specific document structures
>available to a Braille formatter and also works for the two other formats.

D> So you can annotate the HTML DTD and then read HTML documents into
D> a braille printer? Great!

You annotate the DTD, run this any document which conforms to it
through a parsing application. AIS
in France has built one; and Exoterica ships one as a demo ap with
Omnimark now. The software which controls the Braille printers now
read in this tagset as input to their processing.

T> Yuri, you had a little scripting language or something of the sort
T> for use with ICADD attributes, I believe in order to indicate
T> treatment of elements-in-context.  Have you decided you don't
T> need it for this purpose?

The "little language" for transformation still exists. For those
of you who aren't familiar with this, I'll take a moment for
background: In a nutshell, Braille is a formatting language, a
very simple one whose goal is to *reproduce as faithfully as 
possible* the contents of the printed page. Since books are
typeset using software which may generate auto-numbering, cross-
references, fixed text (such as "Chapter "), and so forth, the
goal of the little language is to make that formatting part of
the operation something which is accomplished by the same SGML
parsing application which transforms source tagnames into ICADD
tagnames.

So the ICADD FIXED attributes do several things: establish the
mapping from the source tagset to the ICADD tags both one-to-one
and when they change based on the context; dictate the
generation of both fixed text and auto-numbering (in context as
well -- a list item in an ordered list may get different fixed
text than in an unordered list); turn off processing when necessary
(as in examples or mathematics) and generates notes to a Braille
transcription expert. The attributes are all prefixed
with SDA (for SGML Disabled Access), and have names such as
SDAFORM, SDARULE, SDAPREF, SDASUFF, SDATRANS.

| The ICADD tagset is described and "formalized" in an annex to ISO 12083.

T> Is this new?

No. This is all part of ISO 12083 and has been since it was published
last year. The ISO DTDs include all the SDA attributes as a model.

>Although work on the ICADD tagset predates widespread use of HTML,
>the two tagsets overlap significantly. Many elements have the same names;
>others, with different names, are nonetheless nearly identical in their
>intended functionality; only a handful have specific ICADD capability
>and don't exist in HTML.

D> I wonder about this: are ICADD documents relatively the same size and
D> scope as HTML documents? I see a BOOK element -- that leads me to think
D> that ICADD documents are often quite large, and not really suitable
D> in the same contexts as many HTML documents.

Dan is absolutely right. But I think there will be contexts in which
they are quite suitable. The highest level element in ICADD -- BOOK --
is often used as a substitute for CHAPTER, and most ICADD files are
about that length. I imagine people using this capability more or
less in the following way (as an example): A classful of students,
one or more of whom is blind, are told by a teacher to read chap 6, pages
77 to 86 for the next day. The available text has been prepared by a
Braille production company (such as American Printing House for the
Blind, or the Texas Dept of Education Braille Repository) and put
onto floppy disks. The blind student is able to use a freely
available HTML browser to read the text.

Certainly there are many contexts in which the size of an ICADD
document means it's more useful to use a browser that builds an
automatic table of contents from hierarchical markup, for example.

| This is my idea, in two parts:
| 
| 1) If we extend HTML ever-so-slightly with a tiny handful of new
| elements (AU, BOX, IPP, LHEAD, etc) , and encourage browser
| builders to alias certain ICADD elements to existing HTML elements
| (ANCHOR to A, PARA to P, etc), then we overnight make every
| Web browser into an ICADD browser. Blind people with software

T> Not overnight; the developers will need at least a week.  I like
T> the idea of aliasing; that suggests that browsers can still get
T> by without interpreting HTML as SGML.

In a sense what I'm hoping is that browser creators will do the
aliasing and it'll be invisible to HTML document writers. I only
half agree with Dan when he says:

D> But let's not try to shoehorn everything into HTML. If HTML and ICADD
D> are isomorphic except for some minor details (e.g. LHEAD), let's fix
D> that. But I see no reason to make all the ICADD tag names part of
D> HTML.

They are isomorphic except for some minor details. I think if we
can simply add the new handful of elements, and *NOT* add the 
isomorphic names to HTML but rather simply instruct browser creators
to build the aliases if they want to support ICADD, it'll make HTML
a little simpler, but accomplish the same purposes.

D> From an architectural point of view, it makes more sense to me to just
D> define a new content type: text/icadd, and enhance browsers to support
D> that type. One handy way to implement this is to parse the ICADD
D> document, translate it to HTML internally, and render the HTML.

I don't see the difference really. ICADD files would, presumably
have a different suffix, but for browser creators it'd be much
the same. I'm happy either way. Since there are so few additional
elements, it seemed to me more "harmonized" to make the minor
enhancements to HTML. Whichever.

| (I'll talk about tables separately, in a later posting, if people agree
| that this is all worthwhile activity. In one sentence: I've convinced
| the ICADD people to change the table element names where they
| match to HTML names -- ROW to TR, STUBCELL to TH, CELL 
| to TD. When we work on HTML 3.0, I'll propose the ICADD table
| model with two or three levels of implementation, level 0 being
| roughly the HTMLplus tables of old with optional COLDEF elements
| to hold more formatting when needed, level 2 being full Braille- and
| voice-enabled tables.)

T> Please.  This is worthwhile and tables are needed.

I'll post this later. I'm redoing it as a DTD in the style of the
existing set of html-0, html-1, html, calling it html-tbl.dtd.
(Suggested other nomenclature welcome is this isn't stylisticly
what is wanted. I didn't wanted to make any rash assumptions about
how this might eventually fit into the scheme of things.

| 2) When HTML2.0 is finalized, I'll add the ICADD attributes to it
| in a version that we distribute to content providers who work with
| the blind (Braille translation houses, electronic book creators, etc).
| This will establish the mappings from HTML elements to ICADD
| elements and will mean that all HTML text is *instantly* and
| automatically translatable into both print and on-line Braille.

T> I think these attributes should be part of the regular DTD, not
T> just in some other version.  I'm beginning to wonder about how
T> many sets of fixed atts a DTD can handle, but this is about the
T> most important set I can think of.

D> Great. This makes perfect sense. Let's do take care that HTML can be
D> consumed this way.

My sense is that because of the isomorphism and because HTML
doesn't really expect there to be text generated on the fly, this
should be a simple case of SDAFORM mappings. DOCBOOK, for example,
because it supposes a great deal of sophisticated processing, 
including elements in context handling, needs all the complexity 
of all the attributes.

| So, my question is: Should I propose the extra ICADD elements
| now, as "proposed" for future versions so people can be thinking
| about them, starting to implement, and so forth? Or should I
| wait and make all these proposals as part of the HTML 3.0
| process?

T> Both.  If I understand things correctly, they aren't used now,
T> hence won't be in 2.0.  But there's no reason to hold off
T> discussing what might go in 3.0 until 2.0 is final; people
T> will do so anyway.  One might also imagine that if certain
T> elements become common usage during the time 3.0 is under
T> discussion, a 2.1 DTD might be promulgated, including them.

D> For HTML 3.0, perhaps. But I'd rather see a separate DTD and a separate
D> MIME content type.

Well, we now have two quite different viewpoints. Anyone else?

>Here's a table of comparisons and proposed actions:
>
>HTML NAME        ICADD NAME         EXPLANATION/COMMENT
>_____________________________________________________________________
>
>BLOCKQUOTE [obs] BQ                 HTML to add back if possible

D> I don't know where this is documented as obsolete. If you find someplace
D> in the HTML 2.0 document where BLOCKQUOTE is specified as obsolete,
D> let us know.

Sorry, This is embarrassingly out of date. I originally did this
comparison for an ICADD meeting based on what is now an old 
HTMLplus spec.

>BYLINE?          AU                 HTML to change/add if possible
>FIG              FIG                Must allow PCDATA for
>FOOTNOTE         FN                 Ask WWW browsers to accept alias

D> Are these HTML+ elements?

Yes. I've been remiss in not updating this to match HTML 2.0
but will try to get to it very soon.

>HTMLPLUS         BOOK               May be valuable to keep this distinction
>                                     so browsers will know it's the ICADD DTD

D> Ah... apparently so.
T> could you explain that?

Of course, if they are handled the same way, this won't matter.
The Mosaic people are considering some special handling for 
files they know are ICADD -- fancy stuff with IPP and PP -- so
it occurred to me it might be useful for browsers to know
which tagset.

If we follow Dan's suggestion of a text/icadd type, this goes
away, but then matters less anyway.



| OL/UL            LIST               OL/UL distinction not meaningful in ICADD
|                                      since the generated content must be
|                                      there. Would be good to add LIST to 
|                                      HTML if possible or ask developers to
|                                      accept alias.

T> the latter is the better route; we don't want to suggest that there is
T> a third kind of list here.

Agreed.

| Software could ignore the following elements or do special processing:
|                  IPP                 Would be best if browser makers turned
|                                       this into generated text such as
|                                       "Print Page: ". These are used both
|                                       to alert blind person to matching
|                                       page in printed book and also as
|                                       targets of <PP>. Content could be
|                                       turned from <IPP>154</IPP> into
|                                       <IPP NAME="154"> or equivalent. 
|                  PP                  Reference to <IPP>. Could be treated
|                                       as <A HREF="154"> or equivalent.

T> Would you give an example of usage?  

For example:
<P>As you will see in figure 12 on page <PP>154</PP>, the quick brown
fox has .....

Elsewhere, a formatter -- or a human transcriber -- has ensured that
the braille pages include a <IPP>154</IPP> wherever the printed
copy of the book breaks that page. The braille formatter will place
the IPP page numbers flush right (I believe) so that even though
a typical print page might take up three braille pages, a blind
reader can still find the page number that the rest of the class
is looking at. 

This is the standard usage of PP from the AAP doctypes.


| Concepts not in HTML which would be added for ICADD support:
|                  BOX                Could simply be <HR> at <BOX>
|                                     and another <HR> at </BOX>; browser
|                                     developers could draw vertical lines.

T> what is this supposed to be?  a sidebar?  I'd rather leave it out
T> than suggest a new structure for specifically online
T> presentation that would be problematic to render.

This is a sidebar. Remember that most ICADD usage is for textbooks
which, in the modern style, are sidebar-rich. I don't think we
can actually leave it out is we want to support ICADD files. That
is, we have to do something with a file that has a SIDEBAR in it,
rather than just format it as a paragraph. HTML is pretty specific
about online presentation already. I'm not convinced that this
<HR> approach is so out of keeping.


|                  LHEAD              Optional list headings are useful.

T> Could add an optional Title to lists.

With the aliasing approach, that would be fine. I suspect it would
end up being something like an H2 or H3 rather than TITLE.

| I would further propose that ICADD lose the LANG element and replace it
| with the CHARSET attribute as is done with the HTMLplus proposal.

T> Is this sufficient?  In our Docbook discussion we decided that 
T> charset, lang, and locale were all needed.  We haven't implemented
T> anything yet, hoping that SGML Open will suggest a standard 
T> approach.

D> This is an extremely hairy ball of wax. I'd love to see a complete 
D> proposal for multi-lingual documents.

I agree with both. Getting the very incomplete lang attribute
out of ICADD will lsimply be a way of admitting the problem is big.

| For tables, the proposal is that ICADD adopt TR and TD from HTML, and
| potentially, say, TH and THSUB, and that rest of model be adopted by
| the HTML community from ICADD. I'll post the DTD if there's interest. To
| a very great extent it's forward compatible from the HTMLplus tables.

T> Yes, the DTD please. It might be useful to supply a mapping from CALS,
T> too, or at least a description of what aspects of CALS tables may be
T> too complex to translate to HTMLn.

I'll post the table stuff tomorrow or so.


D> If the ICADD implementors want to support HTML, great! And if WWW
D> implementors want to support ICADD, great! But there's no need to
D> change HTML to make this happen.

This seems to be slightly contradictory with the position that
Dan took at the top of his message, which was that 
D>                                                If HTML and ICADD
D> are isomorphic except for some minor details (e.g. LHEAD), let's fix
D> that. 

My sense is that it's a pretty simply case of a handful of new
elements and instructions to browser makers on aliases for the
identical ones. (Which would mean that <ANCHOR HREF="xxxx"> would
work in an ICADD document in an HTML browser.) 

D> By the way... is there a large body of legacy ICADD documents? I have
D> a hacked up browser that supports all sorts of data formats, and if
D> you give me the ICADD DTD, I could add ICADD support to my hacked up
D> browser in about an hour -- just for the purpose of seeing whether
D> ICADD documents make sense in a WWW browser.

I'll mail the DTD to you. It's short, simple, and even less 
hierarchical than HTML!

D> If there is no legacy of ICADD documents, I suggest the ICADD folks
D> just adopt HTML wholesale, and lobby for the few changes they need
D> (while supporting them in their own applications, of course). There's
D> a HUGE body of legacy HTML.

Unfortunately, there is a legacy both of documents and of software
that processes them. Also the fact that ICADD is part of the ISO 
standard has meant that it's part of other specifications now, mostly
for US State Education authorities who have been requiring textbook
publishers to supply ICADD-encoded files. 

In a sense, my proposal is exactly what you're suggesting: I'm
lobbying for the specific changes they need. If this group is really
opposed to the aliasing idea, I can go back to the ICADD committee
and ask them to change the names to match HTML, and we can go back to
ISO with a request for an amendment to 10283.


Thanks for the interest in this endeavour.


Yuri