Re: Revised language on: ISO/IEC 10646 as Document Character Set

Terry Allen (terry@ora.com)
Sat, 6 May 95 09:48:03 EDT

>I actually make ISO10646 a binding constraint without putting it
in the public text (the SGML declaration). See what you think:

That's bogus. We need the SGML declaration that goes with your language.
According to Glenn, it would be as follows, but this sdecl requires
Glenn's patch to sgmls (which I haven't tried) to parse without error.
So I don't (yet) find it acceptable. Please, if anyone has a better
SGML decl for HTML that "makes 10646 the document character set",
send it to me. And if Glenn's patch is proper, what is the argument
re 8879 that says that sgmls and SP are interpreting the sdecl
incorrectly, and that Glenn's patch makes sgmls do it right?

I'm commenting here on the need for the SGML decl, not the language
as it stands at the moment.

<!SGML "ISO 8879:1986"

--
	SGML Declaration for HyperText Markup Language (HTML)
	as used by the World-Wide Web (WWW) application.

--

CHARSET BASESET "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED BASESET "ISO Registration Number 100//CHARSET ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1" DESCSET 128 32 UNUSED 160 96 32 BASESET "ISO Registration Number 176//CHARSET ISO/IEC 10646-1:1993 UCS-2 with implementation level 3//ESC 2/5 2/15 4/5" DESCSET 256 65280 256

CAPACITY SGMLREF TOTALCAP 150000 GRPCAP 150000 SCOPE DOCUMENT SYNTAX SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 BASESET "ISO 646:1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 128 0 FUNCTION -- SPACE 32 TAB SEPCHAR 9 LF SEPCHAR 10 FF SEPCHAR 12 CR SEPCHAR 13 --

-- The above is an accurate description of the usage of FUNCTION -- -- characters in HTML implementations; that is, there is no -- -- Record Start or Record End character, and no occurences of -- -- character 10 or 13 are "ignored" by the parser. -- -- But because few SGML implementations support this concrete -- -- sytax, we include the one below. --

-- Note that in order to get correct behaviour w.r.t. newline -- -- processing, you will have to play some tricks in construcing -- -- the document entity for parsing in order to keep the parser -- -- from ignoring newlines in surpirsing ways --

RE 13 RS 10 SPACE 32 TAB SEPCHAR 9

NAMING LCNMSTRT "" UCNMSTRT "" LCNMCHAR ".-" UCNMCHAR ".-" NAMECASE GENERAL YES ENTITY NO DELIM GENERAL SGMLREF SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF NAMELEN 72 -- somewhat arbitrary; taken from internet line length conventions -- TAGLVL 100 LITLEN 1024 GRPGTCNT 150 GRPCNT 64

FEATURES MINIMIZE DATATAG NO OMITTAG YES RANK NO SHORTTAG YES LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO FORMAL YES APPINFO NONE > <!-- $Id: html.decl,v 1.8 1994/06/21 17:10:29 connolly Exp $

Author: Daniel W. Connolly <connolly@hal.com>

See also: http://www.hal.com/%7Econnolly/html-spec http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html -->

-- 
Terry Allen  (terry@ora.com)   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472
occasional column at:  http://gnn.com/meta/imedia/webworks/allen/

A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html