#1:
|      1.1.1 Document Structure Elements
|          Body
|         Example of Document Structure Elements
|
|            <HTML>
|            <HEAD>
|            <TITLE>The Document's Title</TITLE>
|            </HEAD>
|            <BODY>
|            The document's text.
|            </BODY>
|
Ending </HTML> is missing. I think that in this example it sh=
ould not
be omitted.
#2:
About charset -parameter in =09Content-type: text/html; chars=
et=3Dsomething
|      1.1.8 Character Data in HTML
|         HTML documents are encoded in some character encodi=
ng;
|         the character encoding may be specified, for exampl=
e,
|         by the "charset" parameter associated with the "tex=
t/html"
|         media type.
        =20
|         Independent of the character encoding used,
|         HTML also allows references to any of the ISO Latin=
-1
|         alphabet, using the names in the table ISO Latin-1
|         Character Representations, which is derived from IS=
O
|         Standard 8879:1986//ENTITIES Added Latin 1//EN. For
|         details, see 2.17.2.
|  2.4 HTML as an Internet Media Type
|      Charset
|         The charset parameter (as defined in section 7.1.1 =
of
|         RFC 1521) may be used with the text/html to specify
|         the encoding used to represent the HTML document as
|         a sequence of bytes. Normally, text/* media types
|         specify a default value of US-ASCII for the charset
|         parameter. However, for text/html, if the byte stre=
am
|         contains data that is not in the 7-bit US-ASCII set=
, the
|         HTML interpreting agent should assume a default cha=
rset of
|         ISO-8859-1.
|         When an HTML document is encoded using US-ASCII,
|         the mechanisms of numeric character references (see
|         Section 2.16.2) and character entity references (se=
e
|         Section 2.16.3) may be used to encode additional ch=
aracters
|         from ISO-8859-1.
|         Other values for the charset parameter are not defi=
ned
|         in this specification, but may be specified in futu=
re
|         levels or versions of HTML.
|         It is envisioned that HTML will use the charset par=
ameter
|         to allow support for non-Latin characters such as
|         Greek, Arabic, Hebrew, Japanese, rather than relyin=
g on
|         any SGML mechanism for doing so.
This document don't specify what to do when charset is not US=
-ASCII or
ISO-8859-1. I think that two issue should be solved:
#2.1:
Is HTML tags interpreted with US-ASCII or ISO-8859-1 even whe=
n charset isn't
superset of US-ASCII? I think that it should. Compare what RF=
C 1563 says
about text/enriched:
! Non-ASCII character sets
!
!   If the character set specified by the charset parameter o=
n the
!   Content-type line is anything other than "US-ASCII", this=
 means that
!   the text being described by text/enriched formatting comm=
ands is in a
!   non-ASCII character set.  However, the commands themselve=
s are still
!   the same ASCII commands that are defined in this document=
.  This
!   creates an ambiguity only with reference to the "<" chara=
cter, the
!   octet with numeric value 60.  In single byte character se=
ts, such as
!   the ISO-8859 family, this is not a problem; the octet 60 =
can be
!   quoted by including it twice, just as for ASCII.  The pro=
blem is more
!   complicated, however, in the case of multi-byte character=
 sets, where
!   the octet 60 might appear at any point in the byte sequen=
ce for any
!   of several characters.
Both Text/enriched and Text/Html are same kind markup languag=
es for MIME, so
I think that they should be same feature in this respect.
#2.2:
How is Numeric Charater References interpreted when charset -=
parameter
is not ISO-8859-1 (or US-ASCII)? I think that they still shou=
ld interpret
according of ISO-8859-1.=20
Reasons?
If we say that Numeric Charater References are interpret acco=
rding of
charset mentioned in charset paramater, we lead conflict when
charset=3DUS-ASCII -- this document however says that they sh=
ould ineterpret
according of Latin/1 (and gives table for them).
And also it is conflict then with text:
|     2.16.3 Numeric Character References
|
|         In addition to any mechanism by which characters ma=
y be
|         represented by the encoding of the HTML document, i=
t is
|         possible to explicitly reference the printing chara=
cters of
|         the ISO-8859-1 character encoding using a numeric c=
haracter
|         reference. See Section
|         2.17.1 for a list of the characters, their names an=
d
|         input syntax.
#3:
|      2.17.3 Numerical Character References
|          -       Unused
|
|         ¡              Inverted exclamation
|         ¬              Not sign
|         ­              Soft hyphen
|         ®              Registered trademark
160 isn't unused. It is Non-breaking space. There should be
Non-breaking space in table or Soft hyphen should also be omi=
tted
from table.
Compare text in earlier:
|   2.16 Character Data
|      No. 1, or simply Latin-1.  Latin-1 includes characters=
 from most
|      Western European languages, as well as a number of con=
trol
|      characters.  Latin-1 also includes a non-breaking spac=
e, a soft
|      hyphen indicator, 93 graphical characters, 8 unassigne=
d
|      characters, and 25 control characters.
|
|      Because non-breaking space and soft hyphen indicator a=
re
|      not recognized and interpreted by all HTML user agents=
,
|      their use is discouraged.
So both of Non-breaking space and Soft hyphen should be=20
Numeric character Reference table or both should be omitted.
--=20
- Kari E. Hurtta                             /  El=E4m=E4 on =
monimutkaista
  Kari.Hurtta@Fmi.FI=09=09=09     puh. (90) 1929 658
  {hurtta,root,Postmaster}@dionysos.fmi.fi