New Topic: HTML and the Visually Impaired [long]

Yuri Rubinsky (yuri@sq.com)
Wed, 31 Aug 94 11:48:13 EDT

Many of you will know that I have a fairly extreme position on the question of
SGML and our collective responsibility to the print-disabled population, which,
when you count all forms of vision disability, amounts to about 7% of
the population.

My position is that a rich markup offers so many advantages for the creation
of Braille, computer voice delivery and large-print publications that it is
crazy not to take such considerations into account when doing markup.

The technical committee of the International Committee for Accessible
Document Design, chaired by George Kerscher of Recording for the Blind
and of which I'm a member, has come up with a technique of using #FIXED
attributes in any DTD in order to map arbitrary elements to a fixed
ICADD set. The ICADD set represents the specific document structures
available to a Braille formatter and also works for the two other formats.

Although work on the ICADD tagset predates widespread use of HTML,
the two tagsets overlap significantly. Many elements have the same names;
others, with different names, are nonetheless nearly identical in their
intended functionality; only a handful have specific ICADD capability
and don't exist in HTML.

The Base ICADD Tag Set

This is the full base set, known as the ICADD22: (<BOX> for "sidebar"
reveals a slight bias in this tagset towards textbook publishing which
is the largest Braille area of applicability.)

ANCHOR Mark Spot on a Page
AU Author(s)
B Bold Emphasized Text
BOOK Highest Level Element for Document
BOX Boxed or Sidebar Information
BQ Block Quotation
FIG Figure Title and Description
FN Footnote
H1 Major Level Heading within Book
H2 Second Level Heading
H3 Third Level Heading or BOX Heading
H4 Fourth Level Heading
H5 Fifth Level Heading
H6 Sixth Level Heading
IPP Page Number of Ink Print Page
IT Italic Emphasized Text
LANG Language Indicator
LHEAD List Heading
LIST List of Items
LIT Literal or Computer Text
LITEM List Item
NOTE Note in Text
OTHER Other Emphasized Text
PARA Paragraph
PP Print Page Reference
TERM Term or Keyword
TI Title of the Book
XREF Cross Reference

An optional set of canonical elements has been created to support the
creation of tables which may be used for Braille, large type and computer
voice. They are:

TABLE The highest level element, which will include at least one
TGROUP
TGROUP Allows repeated combinations of the next three elements to
appear within one table
THEAD Table Header
TBODY Table Body
TFOOT Table Footer
COLDEF Column Definition (which carries necessary attributes for the
column information)
HDROW Row in a Header
HDCELL Cell in a Header
ROW Row in the Table Body
STUBCELL The Non-Data Carrying Stub Cell of a Row
SSTCELL A Sub-Stub Cell in a Row (usually with different indent)
CELL Table Cell
SHORTXT Short Text Element provides alternative text for a stub cell
or head cell for voice representation or for a cell reference
to longer text carried in the NOTE in a Braille table
NOTE Text extracted from Braille table cells in order to allow the
narrowest possible column widths in the table body

++++++++++++++++++++++

The Opportunity

The ICADD tagset is described and "formalized" in an annex to ISO 12083.
This is an informative (that is, non-normative) piece of text, for
information only. It is used in projects throughout the world, and has
been implemented as an input stream to major Braille creation
software and software for creation of electronic books for the blind. At
least one formatting engine implements it for creation of large print
books.

This is my idea, in two parts:

1) If we extend HTML ever-so-slightly with a tiny handful of new
elements (AU, BOX, IPP, LHEAD, etc) , and encourage browser
builders to alias certain ICADD elements to existing HTML elements
(ANCHOR to A, PARA to P, etc), then we overnight make every
Web browser into an ICADD browser. Blind people with software
that reads their screen will be able to use any WWW browser that
is compatible (there are specific ways to implement cursors, for
example, that let the screen reader "find them") with their screen
reader. NCSA has already agreed to work toward making Mosaic
"accessible" in this way.

(I'll talk about tables separately, in a later posting, if people agree
that this is all worthwhile activity. In one sentence: I've convinced
the ICADD people to change the table element names where they
match to HTML names -- ROW to TR, STUBCELL to TH, CELL
to TD. When we work on HTML 3.0, I'll propose the ICADD table
model with two or three levels of implementation, level 0 being
roughly the HTMLplus tables of old with optional COLDEF elements
to hold more formatting when needed, level 2 being full Braille- and
voice-enabled tables.)

2) When HTML2.0 is finalized, I'll add the ICADD attributes to it
in a version that we distribute to content providers who work with
the blind (Braille translation houses, electronic book creators, etc).
This will establish the mappings from HTML elements to ICADD
elements and will mean that all HTML text is *instantly* and
automatically translatable into both print and on-line Braille.

So, my question is: Should I propose the extra ICADD elements
now, as "proposed" for future versions so people can be thinking
about them, starting to implement, and so forth? Or should I
wait and make all these proposals as part of the HTML 3.0
process?

Here's a table of comparisons and proposed actions:

HTML NAME ICADD NAME EXPLANATION/COMMENT
_____________________________________________________________________

A ANCHOR Ask WWW browsers to accept alias
BLOCKQUOTE [obs] BQ HTML to add back if possible
BYLINE? AU HTML to change/add if possible
DT TERM Ask WWW browsers to accept alias
FIG FIG Must allow PCDATA for
alternate description
FOOTNOTE FN Ask WWW browsers to accept alias
HTMLPLUS BOOK May be valuable to keep this distinction
so browsers will know it's the ICADD DTD
even if they don't read the DOCTYPE
I IT Ask WWW browsers to accept alias
LI LITEM Ask WWW browsers to accept alias
OL/UL LIST OL/UL distinction not meaningful in ICADD
since the generated content must be
there. Would be good to add LIST to
HTML if possible or ask developers to
accept alias.
P PARA Ask WWW browsers to accept alias
STRONG OTHER Ask WWW browsers to accept alias
TITLE TI Ask WWW browsers to accept alias

Software could ignore the following elements or do special processing:
IPP Would be best if browser makers turned
this into generated text such as
"Print Page: ". These are used both
to alert blind person to matching
page in printed book and also as
targets of <PP>. Content could be
turned from <IPP>154</IPP> into
<IPP NAME="154"> or equivalent.
PP Reference to <IPP>. Could be treated
as <A HREF="154"> or equivalent.
XREF Only meaningful on paper (pointer to
ANCHOR -- matching ID/IDREF)

Concepts not in HTML which would be added for ICADD support:
BOX Could simply be <HR> at <BOX>
and another <HR> at </BOX>; browser
developers could draw vertical lines.
LHEAD Optional list headings are useful.

I would further propose that ICADD lose the LANG element and replace it
with the CHARSET attribute as is done with the HTMLplus proposal.

For tables, the proposal is that ICADD adopt TR and TD from HTML, and
potentially, say, TH and THSUB, and that rest of model be adopted by
the HTML community from ICADD. I'll post the DTD if there's interest. To
a very great extent it's forward compatible from the HTMLplus tables.

Anyway, sorry for the length of this posting, but I wanted to present
this in its entirety. If there is support for this proposal, then I
will create a version of the HTML2.0 DTD with the 4 or 6 ICADD
*new* elements in as proposed (AU, BQ, BOX, LHEAD, possibly
IPP and PP). I think that the aliased elements
should simply be handled in a note to implementors.

If we are able to do this, I think I can assure you that at the
next "Closing the Gap" conference, there will be a move for the
disabled community to solidly endorse HTML. There's no harm in
that kind of support!

Best,

Yuri Rubinsky
for the ICADD technical Committee