Presentation info in HTML? [was: color text?] [LONG. Sorry]

Dan Connolly (connolly@w3.org)
Wed, 3 May 95 21:49:43 EDT

Gavin Nicol writes:
>
> SGML came about from a particular mindset, which was the result of
> extensive work in document processing. While what you say about the
> SGML language is true, you skip the more important part, which is the
> *mindset* involved.
>
> I think many, many people dislike SGML the language. The group of
> people who dislike the mindset is much smaller. SGML, without the
> mindset, and it's tenets, is meaningless.

Folks: please keep your arguments technical: formal or emperical.

Gavin: if you really want to convince folks of that argument, the
burden is on you to provide some data. Dig up some studies that show
things like "In 1985, we started a project to separate content from
presentation. Today, we can get our job done 3x faster than our
competition." On the one hand, I'm sure they exist. On the other,
after watching this battle waged endlessly in comp.text.sgml (see
"HTML vs. PDF"), I am certain that this discussion is a waste
of time without some new, hard evidence on one side or the other.

I have studied this issue at some length, and it appears to me that
keeping presentation separate from content is cost-effective for
long-lived documents: technical documentation, law, etc. Most SGML
applications to date fall into this camp. (witness: DocBook, Edgar
(almost), CALS, AAP)

But for advertising, journalism, creative art, and many other equally
noble persuits, there is no motivation to separate presentation from
the content. Often, the information is short lived: "My ad doesn't
work on a braille reader? Who cares! It only runs for two months!"
In other cases, the expression of the ideas is as much a part of the
art as the content. In fact, the distiction is quite blurry.

HTML is an SGML application that tries to walk the line. It doesn't
make any of the purists happy, but look how many folks it does
satisfy.

I prefer to think that HTML is successful not because it completely
separates content from presentation, nor because it is just a
sufficiently powerful platform-independent page description language
(or "layout language"), but because it captures the communications
idioms of a large community: everybody communicates using paragraphs,
lists, and headings. Hyperlinking is a powerful idiom that's evidently
a pretty natural match for the way folks communicate. Database forms
applications are as old as the hills, and they open up a limited but
powerful form of two-way communication. Apparently, tables are an
idiom that is critical to the kind of communication that folks want to
do on the net. When I see proposals for new tags, I test them to see
if they fit in this category.

As for simplicity and "ease of typing": HTML is both fairly easy to
parse and fairly easy to learn because its syntax is mostly a
context-free grammar made of regular tokens. This is a structure that
Chomsky's research shows is natural for humans to grasp: they're born
with it. It's also an old-hat problem in computer science. Most CS
degrees require a compilers class that teaches it.

If you look at the tricky spots in HTML, the places where folks have
trouble grokking and implementors goof up, you'll find that they're
mostly exceptions to the regular/context free rule.

That's why I'm so big on s-expression syntax, especially for
stylesheets. s-expressions are EASY to implement. (Go grab the code
that parses WAIS SRC files, for example) People complain because
s-expressions often break the "3 to 7" rule: the human short-term
memory -- it's "stack" -- is only 3 to 7 items deep: any deeper in the
parentheses than that, and folks need tools to help them out.

And tools are critical here. For all the folks worried about how easy
it is to type in stylesheets, and how we can make it hard to make
mistakes, consider this: most folks that use PC's and Macs today NEVER
OPEN UP A TEXT EDITOR AT ALL! Telnetting to port 80 is a handy
debugging tool, but that's ALL it is: a handy debugging tool.

If you want to make something easy to use, you MUST provide a visual
interface. Hack up a Tk/Tcl stylesheet editor, or a Visual basic
stylesheet editor. I'd prefer to judge the complexity of a stylesheet
language on the basis of how difficult that task is, rather than, for
example, asking 100 secretaries to read the spec and try to use it.

Remember: the original web client was a WYSIWYG hypertext editor under
NeXTStep. The web client that put this thing on the map had a friendly
Motif UI. I predict the stylesheet mechanism that gets deployed will
be the first one integrated visually into a freeware authoring system
(even if it's more along the lines of Quark express than MS Word,
i.e. you do most of your editing with another tool, then import it
into this tool just to work on the stylesheet.)

> I recommend that you go away into a room somewhere, and think about
> what a document really is.

The SGML community has a reputation for being... well... eletist. The
internet community has a reputation for being unfriendly to vendors.

These reputations cost everybody. They serve noone. Let us not
perpetuate them.

A kludge should be argued down, and unscrupulous vendors should be
shunned. But not all presentational markup is a hack, and not all
vendors are unscrupulous. If you have evidence, present it. Otherwise,
take this sort of noise elsewhere.

As far as I can see, allowing

<body font.family="helvetica">

doesn't _prevent_ folks from separating content from presentation,
as long as we agree that browsers will support

<body class="techreport">

with stylesheets keyed on class/id as well.

The one thing that style attributes does is break down modularity
between HTML and stylesheet languages, which is something I'd like to
avoid. I was hoping to see various style sheet notations compete
independent of HTML evolution.

If we put style info in HTML, it will almost certainly favor one style
notation over another, if not completely wipe out all others. We will
also be opening up all the cans of worms like fonts and spacing units,
not to mention issues from other media besides print and bitmap
displays, like audio, video, ... . It will make the job of editing
the HTML specification much more difficult.

For that reason alone, I'll be lobbying against style info in HTML. So
to get style info in the standard, somebody should be prepared not
only to ante up working code, but to provide editorial resources as
well!

It also makes the notion of conformance, which is cloudy enough at
present, almost unmangeable:

I think we can all agree that this style info -- at least the sort of
stuff folks want to put in-line in HTML -- is inessential. Browsers
are allowed to ignore it. But this business of having different
browsers do different things is tricky. It's good, but it can be
tough on "consumer confidence."

We have to clearly delineate the essential stuff from the innessential
stuff so that, for example, an authoring system could have a "minimal
mode": it would show you what your document _may_ look like on a
minimally conforming client, so that you don't rely on the optional
presentation stuff to get the heart of your message across.

Whew! Apologies for the length of this message! I hope it's useful.

Daniel W. Connolly "We believe in the interconnectedness of all things"
Research Technical Staff, MIT/W3C
<connolly@w3.org> http://www.w3.org/hypertext/WWW/People/Connolly