Undeclared entities, wierd numeric character references

Dan Connolly (connolly@w3.org)
Fri, 5 May 95 19:46:31 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Dan Connolly: "HTML draft -02 not on ds.internic.net"
Previous message: David - Morris: "Re: ISO/IEC 10646 as Document Character Set"
Next in thread: David - Morris: "Re: Undeclared entities, wierd numeric character references"
Maybe reply: David - Morris: "Re: Undeclared entities, wierd numeric character references"

dwm@shell.portal.com writes:
>
>
> On Fri, 5 May 1995, Alex Hopmann wrote:
>
> > 2) HTML 2.0 uses 10646. We say that minimally complient browsers must only
> > support the first 256 positions, or in other words Latin-1. A reference to
> > ૥ gets rounded to 8 bits like Glenn found from experience. People
>
> Seems to me as a publisher and reader of published material, there is
> no conceptual difference between ૥ and &xxx; where the rendering
> program doesn't understand what they mean.

Hmmm... as an SGML implementor, they're quite distinct to me. There
was some confusing terminology like "character octet entities" and
"numeric entity references" thrown around for a while. To be clear:
the terms are "numeric character reference" and "entity reference".
Numeric character references have nothing to do with entities.

The term "character entity" is strictly informal: it's just a text
entity that happens to be one character long.

But anyway...

> I would expect (and have
> seen for   at least) browsers to just leave the unknown entity
> as written in the text.

Yup.

> Irrespective of the ultimate document character set, should the standard
> spell out handling of undefined entities?

The spec currently says this;

"Undeclared Markup Error Handling"
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_3.html#SEC18

It doesn't say anything about numeric character references that aren't
in the document character set. I'd prefer to leave it that way.

Dan

Next message: Dan Connolly: "HTML draft -02 not on ds.internic.net"
Previous message: David - Morris: "Re: ISO/IEC 10646 as Document Character Set"
Next in thread: David - Morris: "Re: Undeclared entities, wierd numeric character references"
Maybe reply: David - Morris: "Re: Undeclared entities, wierd numeric character references"