New Topic: HTML and the Visually Impaired [long]

yuri@sq.com (Yuri Rubinsky)
Date: Wed, 31 Aug 94 11:48:13 EDT
Message-id: <m0qfrAj-000ESAC@sq.com>
Reply-To: yuri@sq.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: yuri@sq.com (Yuri Rubinsky)
To: Multiple recipients of list <html-wg@oclc.org>
Subject: New Topic: HTML and the Visually Impaired [long]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)


Many of you will know that I have a fairly extreme position on the question of
SGML and our collective responsibility to the print-disabled population, which, 
when you count all forms of vision disability, amounts to about 7% of
the population.

My position is that a rich markup offers so many advantages for the creation
of Braille, computer voice delivery and large-print publications that it is
crazy not to take such considerations into account when doing markup.

The technical committee of the International Committee for Accessible
Document Design, chaired by George Kerscher of Recording for the Blind
and of which I'm a member, has come up with a technique of using #FIXED
attributes in any  DTD in order to map arbitrary elements to a fixed
ICADD set. The ICADD set represents the specific document structures
available to a Braille formatter and also works for the two other formats.

Although work on the ICADD tagset predates widespread use of HTML,
the two tagsets overlap significantly. Many elements have the same names;
others, with different names, are nonetheless nearly identical in their
intended functionality; only a handful have specific ICADD capability
and don't exist in HTML.

The Base ICADD Tag Set

This is the full base set, known as the ICADD22: (<BOX> for "sidebar"
reveals a slight bias in this tagset towards textbook publishing which 
is the largest Braille area of applicability.)

ANCHOR    Mark Spot on a Page
AU 	     Author(s)
B 	     Bold Emphasized Text
BOOK      Highest Level Element for Document
BOX       Boxed or Sidebar Information
BQ 	     Block Quotation
FIG       Figure Title and Description
FN 	     Footnote
H1 	     Major Level Heading within Book
H2 	     Second Level Heading
H3 	     Third Level Heading or BOX Heading
H4 	     Fourth Level Heading
H5        Fifth Level Heading
H6 	     Sixth Level Heading
IPP       Page Number of Ink Print Page
IT 	     Italic Emphasized Text
LANG      Language Indicator
LHEAD     List Heading
LIST      List of Items
LIT       Literal or Computer Text
LITEM     List Item
NOTE      Note in Text
OTHER     Other Emphasized Text
PARA      Paragraph
PP 	     Print Page Reference
TERM      Term or Keyword
TI 	     Title of the Book
XREF      Cross Reference

An optional set of canonical elements has been created to support the
creation of tables which may be used for Braille, large type and computer
voice. They are:

TABLE     The highest level element, which will include at least one   
          TGROUP
TGROUP    Allows repeated combinations of the next three elements to
          appear within one table
THEAD     Table Header
TBODY     Table Body
TFOOT     Table Footer
COLDEF    Column Definition (which carries necessary attributes for the
          column information)
HDROW     Row in a Header
HDCELL    Cell in a Header
ROW       Row in the Table Body
STUBCELL  The Non-Data Carrying Stub Cell of a Row
SSTCELL   A Sub-Stub Cell in a Row (usually with different indent)
CELL      Table Cell
SHORTXT   Short Text Element provides alternative text for a stub cell
          or head cell for voice representation or for a cell reference
          to longer text carried in the NOTE in a Braille table
NOTE      Text extracted from Braille table cells in order to allow the
          narrowest possible column widths in the table body

++++++++++++++++++++++

The Opportunity

The ICADD tagset is described and "formalized" in an annex to ISO 12083.
This is an informative (that is, non-normative) piece of text, for 
information only. It is used in projects throughout the world, and has
been implemented as an input stream to major Braille creation
software and software for creation of electronic books for the blind. At
least one formatting engine implements it for creation of large print
books.

This is my idea, in two parts:

1) If we extend HTML ever-so-slightly with a tiny handful of new
elements (AU, BOX, IPP, LHEAD, etc) , and encourage browser
builders to alias certain ICADD elements to existing HTML elements
(ANCHOR to A, PARA to P, etc), then we overnight make every
Web browser into an ICADD browser. Blind people with software
that reads their screen will be able to use any WWW browser that
is compatible (there are specific ways to implement cursors, for
example, that let the screen reader "find them") with their screen
reader. NCSA has already agreed to work toward making Mosaic
"accessible" in this way.

(I'll talk about tables separately, in a later posting, if people agree
that this is all worthwhile activity. In one sentence: I've convinced
the ICADD people to change the table element names where they
match to HTML names -- ROW to TR, STUBCELL to TH, CELL 
to TD. When we work on HTML 3.0, I'll propose the ICADD table
model with two or three levels of implementation, level 0 being
roughly the HTMLplus tables of old with optional COLDEF elements
to hold more formatting when needed, level 2 being full Braille- and
voice-enabled tables.)

2) When HTML2.0 is finalized, I'll add the ICADD attributes to it
in a version that we distribute to content providers who work with
the blind (Braille translation houses, electronic book creators, etc).
This will establish the mappings from HTML elements to ICADD
elements and will mean that all HTML text is *instantly* and
automatically translatable into both print and on-line Braille.


So, my question is: Should I propose the extra ICADD elements
now, as "proposed" for future versions so people can be thinking
about them, starting to implement, and so forth? Or should I
wait and make all these proposals as part of the HTML 3.0
process?


Here's a table of comparisons and proposed actions:

HTML NAME        ICADD NAME         EXPLANATION/COMMENT
_____________________________________________________________________

A                ANCHOR             Ask WWW browsers to accept alias
BLOCKQUOTE [obs] BQ                 HTML to add back if possible
BYLINE?          AU                 HTML to change/add if possible
DT               TERM               Ask WWW browsers to accept alias
FIG              FIG                Must allow PCDATA for
                                     alternate description
FOOTNOTE         FN                 Ask WWW browsers to accept alias
HTMLPLUS         BOOK               May be valuable to keep this distinction
                                     so browsers will know it's the ICADD DTD
                                     even if they don't read the DOCTYPE
I                IT                 Ask WWW browsers to accept alias
LI               LITEM              Ask WWW browsers to accept alias
OL/UL            LIST               OL/UL distinction not meaningful in ICADD
                                     since the generated content must be
                                     there. Would be good to add LIST to 
                                     HTML if possible or ask developers to
                                     accept alias.
P                PARA               Ask WWW browsers to accept alias
STRONG           OTHER              Ask WWW browsers to accept alias
TITLE            TI                 Ask WWW browsers to accept alias

Software could ignore the following elements or do special processing:
                 IPP                 Would be best if browser makers turned
                                      this into generated text such as
                                      "Print Page: ". These are used both
                                      to alert blind person to matching
                                      page in printed book and also as
                                      targets of <PP>. Content could be
                                      turned from <IPP>154</IPP> into
                                      <IPP NAME="154"> or equivalent. 
                 PP                  Reference to <IPP>. Could be treated
                                      as <A HREF="154"> or equivalent.
                 XREF                Only meaningful on paper (pointer to
                                      ANCHOR -- matching ID/IDREF)

Concepts not in HTML which would be added for ICADD support:
                 BOX                Could simply be <HR> at <BOX>
                                    and another <HR> at </BOX>; browser
                                    developers could draw vertical lines.
                 LHEAD              Optional list headings are useful.

I would further propose that ICADD lose the LANG element and replace it
with the CHARSET attribute as is done with the HTMLplus proposal.


For tables, the proposal is that ICADD adopt TR and TD from HTML, and
potentially, say, TH and THSUB, and that rest of model be adopted by
the HTML community from ICADD. I'll post the DTD if there's interest. To
a very great extent it's forward compatible from the HTMLplus tables.

Anyway, sorry for the length of this posting, but I wanted to present
this in its entirety. If there is support for this proposal, then I
will create a version of the HTML2.0 DTD with the 4 or 6 ICADD 
*new* elements in as proposed (AU, BQ, BOX, LHEAD, possibly
IPP and PP).  I think that the aliased elements 
should simply be handled in a note to implementors.

If we are able to do this, I think I can assure you that at the
next "Closing the Gap" conference, there will be a move for the 
disabled community to solidly endorse HTML. There's no harm in
that kind of support!


Best,

Yuri Rubinsky
for the ICADD technical Committee