Where are Fonts and Phrases allowed? [Was: HTML 2.0 Call for Review ]

"Daniel W. Connolly" <connolly@hal.com>
Message-id: <9406101945.AA08051@ulua.hal.com>
To: html-ig@oclc.org
Subject: Where are Fonts and Phrases allowed? [Was: HTML 2.0 Call for Review ]
In-Reply-To: Your message of "Mon, 06 Jun 1994 13:25:11 -0000."
             <9406061225.AA11520@dragget.hpl.hp.com> 
Date: Fri, 10 Jun 1994 14:45:59 -0500
From: "Daniel W. Connolly" <connolly@hal.com>
Content-Length: 3094
In message <9406061225.AA11520@dragget.hpl.hp.com>, Dave Raggett writes:
>
>    o   Whats wrong with B, I etc inside PRE?
>
>Your comment suggests that these are illegal inside PRE elements, why?

I meant to suggest that they should be illegal _outside_ of PRE.

In practice, fonts and phrases are used inside and outside of PRE, and
for 2.0, they will be legal. But we may want to put some language in
the spec that discourages some practices and warns that they may
become obsolete.

My feeling is that B and I should only be used when all you know is
that the text is bold or italic, and you don't know why, like if
you're converting nroff output. But if you're marking up a phrase in a
normal paragraph, you don't have the ability to specify fonts -- only
phrase level emphasis like EM, STRONG, CODE, etc.

Hence EM, STRONG, etc. would be illegal _inside_ PRE.


But there's a more critical question which is: what is the _specified_
behaviour of the EM, B, I, STRONG, etc. elements? That is, when an
author writes <em>something like this</em>, we don't want to require
that conforming implementations necesarily have to convert this to
italics. (Because, for example, this makes not sense for applications
like HTML readers for the blind).

But is it OK for an implementation to completely ignore the EM tag?
In that case, an author that wants to be absolutely certain that the
phrase gets emphasized can't count on the EM tag to do the trick.

My feeling is that this is a "level 1" feature. I'm working on making
the formal distinctions between levels in the DTD -- it's not quite
done yet. But the theory is that there are currently three levels of
HTML conformance:

	Level 0: TITLE, HEAD, ISINDEX BODY, H1-H6, P, A, UL, OL only
		(maybe a couple others I forgot)

	Level 1: IMG, EM, STRONG, B, I,...

	Level 2: FORM, INPUT, OPTION, ...


[Somebody needs to write up this Level business and the implications
on format negotiation in the spec somehwere...]

So to a level 0 browser, <EM> is noise. If you validate against the
level 0 DTD, you'll get an error.

But what is required for levels >= 1? I'd say that

	* content of EM, STRONG, B, I, CITE etc. elements _must_ be
		distinguishable from un-emphasized text
	* STRONG must be different from EM
	* B must be different from I

and that's it. CITE might be the same as STRONG, or it might be
the same as EM, or it might be distinguished in some other way.
By distinguishable, I mean distinguishable by the information
consumer -- the browser user, or in the case of an HTML->???
translator, the ??? format files.


Another subtlety: is nesting allowed? Must it be treated
significantly? For example, Must this:

	x <em>y <em>z</em></em>		(1)

be distinguished from this...?

	x <em>y</em><em> z</em>		(2)


Must this:

	x <i>y <b>z</b></i>		(3)

be distinguished from this...?

	x <i>y </i><b>z</b>		(4)


I propose that nesting is syntactically allowed, but it has no
semantic meaning -- that is, each of the above syntaxes is allowed,
but an application is allowed to treat (1) the same as (2) and
to treat (3) the same as (4).