Re: CRLF line breaks, charset

Daniel W. Connolly (connolly@hal.com)
Tue, 29 Nov 94 12:58:37 EST

In message <94Nov29.013329pst.2760@golden.parc.xerox.com>, Larry Masinter write
s:
>I'm trying to read RFC 1521 carefully. The goal of the HTML
>specification is to define the mime type "text/html".

.. and to define HTML as an SGML application.

>Section 7.1 of RFC 1521 makes several assertions about the "text"
>Content-Type, in particular, the handling of the charset parameter
>(that the default character set, which must be assumed in the absence
>of a charset parameter, is US-ASCII), and that the combination CRLF is
>used to mean a new line.
>
>I think this means that if we're going to actually register text/html
>rather than application/html, we'll have to either be very careful to
>supply the charset parameter directly, or else attempt to bend the
>rules laid out in RFC 1521.

About the default charset:

There is something of an issue here:

According to the MIME spec, a message such as:

To: xxx
MIME-Version: 1.0
Content-Type: text/html

<title> foo </title>
...

has an implicit charset=US-ASCII. Octets 128-255 are a no-no.

[Note that this doesn't conflict with saying that for the purposes of
SGML parsing, the document character set includes the right half of
the Latin 1 charset, so that the markup &#246; and/or &ouml; make
sense.]

But current practice is that an HTTP response of:

0200 Document follows
Content-Type: text/html

<title> foo </title>
...

has an implicit charset=ISO-8859-1.

As far as registration of text/html, it's probably not consistent with
the MIME spec to say that the default charset is ISO-8859-1. For MIME
purposes, the default charset will have to be US-ASCII.

Could the HTTP spec say that since the default
Content-Transfer-Encoding there is binary, the default charset for
text body parts is extended to ISO-8859-1?

Regarding newlines:

>In general, Internet protocol standards should only talk about how
>things are sent out over the net. How you store them locally is up to
>you.

Certainly. text/html is supposed to go over the wire with lines
delimited by CRLF pairs.

But since HTML is also an SGML application, it makes sense to talk
about HTML documents that may not be going across the wire.

Dan