Re: Latin 1 is NOT superset of ASCII

Murray Maloney <murray@oclc.org>
Date: Thu, 16 Jun 94 15:03:46 EDT
Message-id: <9406161452.aa11712@dali.scocan.sco.COM>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: Murray Maloney <murray@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Re: Latin 1 is NOT superset of ASCII
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group
> 
> In message <9406161238.aa11418@dali.scocan.sco.COM>, Murray Maloney writes:
> >
> >To allay my concerns and to be very precise about what character
> >sets and encodings are valid within HTML and WWW, I suggest that
> >we refer only to ISO standards such as ISO 646, ISO 8859/1, and 
> >ISO 6937/2.
> 
> I agree... I gather you're responding to my use of "ASCII" in my
> description. It was meant only as a hint to my readers here.

As a hint it is quite useful.  However, it breeds confusion.
> 
> We can't count on names like "ISO 646" meaning anythin to the
> readers of the HTML specification. On the other hand, we can't use
> the term "ASCII" because it has no widely agreed-upon meaning.

If we can't count on an ISO standard to provide specificity,
then what is the point of writing the HTML standard?
> 
> There is a term used in the IETF community for the 7-bit character
> set used in internet mail and such: it's US-ASCII.

Neat!  Yet another attribution for US-ASCII.  "US-ASCII" is redundant.
ASCII is the American Standard Code for Information Interchange,
an ANSI standard that defines a character set for compatibility
between data services.

There are, by my most recent count, four (count 'em) definitions
for US-ASCII.  

	- Dan mentions that the IETF uses this term to refer
	  to the 7-bit character set used in internet mail and such.
	- IBM used this term to refer to an 8-bit character 
	  set of their own design that is still used in
	  DOS and Windows software today
	- HP use this term to refer to an 8-bit character set,
	  which (mostly) matched IBM's definition, on their
	  early LaserJet models.  They may use it still.
	- Many Macintosh users use this term to refer to
	  the 8-bit character set used on that computer.

So, you see, while the definition of ASCII is quite clear
to anyone who might choose to refer to the ANSI standard,
it is anything but clear to a growing class of users and
software developers.

> 
> I agree that HTML should be specified in terms of ISO character sets,
> but we should include some informative language that explains what
> those beasts are relative to US-ASCII.

If you will stipulate to simple "ASCII", I'll agree and happily
write the descriptive text that explains what ASCII, and I will
also gladly provide the descriptive text for ISO 8859/1 and 
the control characters.
> 
> We will be safest if we do not rely on external definitions of ISO 646
> and such, but rather call out each character code and give its
> meaning, both in the SGML declaration and in the prose of the spec.

I don't think that I agree completely.  The SGML declaration is
specified in terms of ISO 646 directly and ISO 8859/1 indirectly.
I think that it would be quite helpful to provide a complete
description of both within the spec.
> 
> I meant to describe current practice in terms of character codes with
> my little table. I'll leave it to Mr. Maloney to come up with the
> specification wording.

Gladly.
> 
> Dan
>