Comments on HTTP spec
"Daniel W. Connolly" <connolly@hal.com>
Errors-To: listmaster@www0.cern.ch
Date: Wed, 9 Mar 1994 21:20:10 --100
Message-id: <9403092007.AA05704@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Comments on HTTP spec
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 18676
------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <5699.763243624.2@ulua>
Content-Description: HTTP comments as text
Comments on HTTP spec
Mime Terms
The HTTP spec[1] makes many references to MIME and related terms, but it
seems to use the terms loosely and incorrectly in many cases:
Some useful terms are:
Body Part Headers, a blank line, and a body
Body a sequence of octets obeying a content transfer
encoding implied by context (such as the
Content-Transfer-Encoding header in the enclosing body
part).
Perhaps a better way to explain it is:
text/plain akindof Body (just a plain text data stream)
image/gif akindof Body (ordinary GIF data stream)
message/rfc822 akindof Body (data stream defined by RFC822)
message/rfc822 akindof Body Part (wierd, huh?)
Body Part hasa Head
Body Part hasa Body
Head hasa Content-Transfer-Encoding
Head hasa Content-Type
multipart/mixed isa Body
multipart/mixed hasa sequence of Body Parts
From HTTP: The Request[2]:
FullRequest = Method UR ProtocolVersion CrLf
[*<HTRQ Header>]
[<CrLf> <data>]
<Method> = <InitialAlpha>
ProtocolVersion = HTTP/1.0
uri = <as defined in URL spec>
<HTRQ Header> = <Fieldname> : <Value> <CrLf>
<data> = MIME-conforming-message
What is "MIME-conforming-message"? The term itself suggests that it means an
body of type message/rfc822, but the usage in the grammar says it's just a
generic body. Using the grammar of the MIME spec, RFC1521[3], this should be
written:
FullRequest = Method UR ProtocolVersion CrLf
[body-part]
<Method> = <InitialAlpha>
ProtocolVersion = HTTP/1.0
uri = <as defined in URL spec>
where body-part should be defined similarly to the way it's done in MIME,
i.e. some variation on the following:
body-part := <"message" as defined in RFC 822,
with all header fields optional, and with the
specified delimiter not occurring anywhere in
the message body, either on a line by itself
or as a substring anywhere. Note that the
semantics of a part differ from the semantics
of a message, as described in the text.>
Accept Field
Hmmm... in the Accept: field: is it comma or semicolot separated? Contrast:
This field contains a semicolon-separated list of representation sc
hemes ( Content-Type metainformation values) which will be accepted i
n the response to this request.
with this example:
Accept: text/plain, text/html
Accept: text/x-dvi; q=.8; mxb=100000; mxt=5.0, text/x-c
and then there's this example:
Accept: *.*, q=0.1
Accept: audio/*, q=0.2
Accept: audio/basic q=1
Default Content-Transfer-Encoding
From Comments on HTTP spec[4]
The data (if any) sent with an HTTP request or reply is in a format
and encoding defined by the object header fields, the default being
"plain/text" type with "8bit" encoding. Note that while all the other
information in the request (just as in the reply) is in ISO Latin1 w
ith lines delimited by Carriage Return/Line Feed pairs, the data may
contain 8-bit binary data.
The 8bit encoding means characters can be in the range 0-255, but the
characters are still arranged as lines of text not to exceed 76 characters,
delimited by CRLF pairs.
The binary encoding meands an arbitrary stream of octets. I expect this is
what the HTTP designers had in mind for the default
content-transfer-encoding... Hmmm... or perhaps not... perhaps sending
arbitrary binary data is somewhat at odds with historical usage of HTTP, and
so binary data should be labelled as such. This could use some clarification
in any case.
The libWWW implementation
I browsed around the libWWW implementation, and I'm somewhat confused:
First of all what is "www/mime"? It appears to work like message/rfc822, but
then stuff labelled message/rfc822 seems to be handed off to MetaMail!!!
From what I can tell, there are two classes of HTTP transactions:
The "0.9" style transaction could be described by the following Modula-like
interface:
INTERFACE HTTP0_9;
TYPE HTML = TEXT;
TYPE SearchWords = REF ARRAY OF TEXT;
TYPE Request = RECORD
resource : TEXT;
search : SearchWords = NIL;
END;
TYPE Response = RECORD
structured : HTML;
plain : TEXT = NIL;
END;
END HTTP0_9.
For example,
HTTP Modula
Client:
GET /foo/bar.html req = Request{resource="/foo/bar.html"}
;
Server:
<PLAINTEXT> resp := Response{
Four score and seven years ago structured="<PLAINTEXT>",
today... plain="Four score and seven..."
};
or...
HTTP Modula
Client:
GET /foo?a+b words := NEW(ARRAY[3] OF TEXT);
words[0] := "a"; words[1] := "b";
req = Request{resource="/foo",
search = words);
Server:
<H1>Search Results
</H1> r := Response{structured="<H1>Search...
"};
<A HREF="x1">r1>/A>
The "1.0" style transaction is more involved... it looks like:
INTERFACE MIME;
TYPE Body = TEXT;
TYPE BaseType = { text, audio, image, video,
application, message, multipart, extension };
TYPE SubType = { plain, enriched, basic, gif, jpeg, mpeg,
octet_stream, postscript,
rfc822, partial, external_body,
alternative, mixed, parallel,
extension };
TYPE ContentType = OBJECT
base : BaseType;
xbase : TEXT;
sub : SubType;
xsub: TEXT;
parameters : REF ARRAY OF RECORD
name : TEXT;
value : TEXT;
END;
TYPE BodyPart = OBJECT
headers = REF ARRAY OF RECORD
name : TEXT;
value : TEXT;
END;
body : Body;
METHODS
contentType() : ContentType;
decode() : TEXT; (* undoes Content-Transfer-Encoding *)
END;
END MIME.
INTERFACE HTTP1_0;
IMPORT MIME;
TYPE SearchWords = REF ARRAY OF TEXT;
TYPE Method = {'GET'};
TYPE Request = RECORD
method : Method;
resource : TEXT;
search : SearchWords;
aux : MIME.BodyPart;
TYPE Version = [0..999];
TYPE StatusCode = [0..999];
TYPE Line = TEXT; (* with no CR or LF chars *)
TYPE Response = RECORD
version : Version;
code : StatusCode;
reason : Line;
object : MIME.BodyPart;
END;
END.
Whew... that was a fun excercise. I'm not sure what the point of it was,
except to attempt to specify HTTP at a more abstract level than sequences of
characters.
A minimal HTTP client must understand the text/plain and text/html content
types, plus the message/rfc822 head/body syntax. Unlike a minimally
conforming MIME user agent, it is not required to understand any encodings
(base64 and quoted-printable are required for MIME UA's) or multipart
content types (MIME UAs must understand the boundary syntax). Fair enough...
it matches historical usage.
It seems that there are several of servers out there that will take mail
messages/news articles and serve them up as HTML. This makes sense if
they're going to add hyperlinking markup. On the other hand, it makes a lot
of sense for smart WWW clients to understand RFC822/MIME syntax in all its
glory (with the built-in ability to recognize references, etc.), given the
amount of data available in this format (from NTTP servers, WAIS servers of
mail archives, etc.... unfortunately, I think a lot of WAIS servers label
news articles as TEXT, rather than message/rfc822)
What's the direction in this area? I for one would like to see HTML and
RFC822/MIME as alternate representations for roughly the same structure of
information. More on that later...
REFERENCES
1521[5] DS N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies", 09/23/1993. (Pages=81)
(Format=.txt, .ps) (Obsoletes RFC1341) (Updated by RFC1590)
HTTP: A protocol for networked information[6]
------- =_aaaaaaaaaa0
Content-Type: text/x-html; charset="us-ascii"
Content-ID: <5699.763243624.3@ulua>
Content-Description: HTTP comments as html
Content-Transfer-Encoding: quoted-printable
<HEAD>
<TITLE>Comments on HTTP spec</TITLE>
</HEAD>
<body>
<H2>Mime Terms</H2>
<A HREF=3D"#HTTP">The HTTP spec</A> makes many references to MIME and
related terms, but it seems to use the terms loosely and incorrectly
in many cases: <P>
Some useful terms are:
<DL>
<DT> Body Part
<DD> Headers, a blank line, and a body
<DT> Body
<DD> a sequence of octets obeying a content transfer encoding implied
by context (such as the Content-Transfer-Encoding header in the
enclosing body part).
</DL>
Perhaps a better way to explain it is:
<PRE>
text/plain akindof Body (just a plain text data stream)
image/gif akindof Body (ordinary GIF data stream)
message/rfc822 akindof Body (data stream defined by RFC822)
message/rfc822 akindof Body Part (wierd, huh?)
Body Part hasa Head
Body Part hasa Body
Head hasa Content-Transfer-Encoding
Head hasa Content-Type
multipart/mixed isa Body
multipart/mixed hasa sequence of Body Parts
</PRE>
=46rom <A HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/Request=
.html">
HTTP: The Request</A>:
<BLOCKQUOTE>
<PRE>
FullRequest =3D Method UR ProtocolVersion CrLf
[*<HTRQ Header>]
[<CrLf> <data>]
<Method> =3D <InitialAlpha>
ProtocolVersion =3D HTTP/1.0
uri =3D <as defined in URL spec>
<HTRQ Header> =3D <Fieldname> : <Value> =
<CrLf>
<data> =3D MIME-conforming-message =
</PRE>
</BLOCKQUOTE>
What is "MIME-conforming-message"? The term itself suggests that it
means an body of type message/rfc822, but the usage in the grammar
says it's just a generic body. Using the grammar of the MIME spec,
<A HREF=3D"#rfc1521">RFC1521</A>, this should be written:
<PRE>
FullRequest =3D Method UR ProtocolVersion CrLf
[body-part]
<Method> =3D <InitialAlpha>
ProtocolVersion =3D HTTP/1.0
uri =3D <as defined in URL spec>
</PRE>
where <CODE>body-part</CODE> should be defined similarly to the way
it's done in MIME, i.e. some variation on the following:
<PRE>
body-part :=3D <"message" as defined in RFC 822,
with all header fields optional, and with the
specified delimiter not occurring anywhere in
the message body, either on a line by itself
or as a substring anywhere. Note that the
semantics of a part differ from the semantics
of a message, as described in the text.>
</PRE>
<H2>Accept Field</H2>
Hmmm... in the Accept: field: is it comma or semicolot separated?
Contrast:
<BLOCKQUOTE>
This field contains a semicolon-separated list of representation schemes (=
Content-Type metainformation values)
which will be accepted in the response to this request.
</BLOCKQUOTE>
with this example:
<BLOCKQUOTE>
<PRE>
Accept: text/plain, text/html
Accept: text/x-dvi; q=3D.8; mxb=3D100000; mxt=3D5.0, text/x-c
</PRE>
</BLOCKQUOTE>
and then there's this example:
<BLOCKQUOTE>
<PRE>
Accept: *.*, q=3D0.1
Accept: audio/*, q=3D0.2
Accept: audio/basic q=3D1
</PRE>
</BLOCKQUOTE>
<H2>Default Content-Transfer-Encoding</H2>
=46rom Comments on <A
HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/Body.html">HTTP
spec</A>
<BLOCKQUOTE>
The data (if any) sent with an HTTP request or reply is in a format
and encoding defined by the object header fields, the default being
"plain/text" type with "8bit" encoding. Note that while all the other
information in the request (just as in the reply) is in ISO Latin1
with lines delimited by Carriage Return/Line Feed pairs, the data may
contain 8-bit binary data.
</BLOCKQUOTE>
The 8bit encoding means characters can be in the range 0-255, but the
characters are still arranged as lines of text not to exceed 76
characters, delimited by CRLF pairs. <P>
The binary encoding meands an arbitrary stream of octets. I expect
this is what the HTTP designers had in mind for the default
content-transfer-encoding... Hmmm... or perhaps not... perhaps sending
arbitrary binary data is somewhat at odds with historical usage of
HTTP, and so binary data should be labelled as such. This could use
some clarification in any case.<P>
<H2>The libWWW implementation</H2>
I browsed around the libWWW implementation, and I'm somewhat confused:
<P>
First of all what is "www/mime"? It appears to work like
message/rfc822, but then stuff labelled message/rfc822 seems to be
handed off to MetaMail!!! <P>
=46rom what I can tell, there are two classes of HTTP transactions: <P>
The "0.9" style transaction could be described by the following
Modula-like interface:
<PRE>
INTERFACE HTTP0_9;
TYPE HTML =3D TEXT;
TYPE SearchWords =3D REF ARRAY OF TEXT;
TYPE Request =3D RECORD
resource : TEXT;
search : SearchWords =3D NIL;
END;
TYPE Response =3D RECORD
structured : HTML;
plain : TEXT =3D NIL;
END;
END HTTP0_9.
</PRE>
For example,
<PRE>
HTTP Modula
Client:
GET /foo/bar.html req =3D Request{resource=3D"/foo/bar.html"};
Server:
<PLAINTEXT> resp :=3D Response{
Four score and seven years ago structured=3D"<PLAINTEXT>",
today... plain=3D"Four score and seven..."};
</PRE>
or...
<PRE>
HTTP Modula
Client:
GET /foo?a+b words :=3D NEW(ARRAY[3] OF TEXT);
words[0] :=3D "a"; words[1] :=3D "b";
req =3D Request{resource=3D"/foo",
search =3D words);
Server:
<H1>Search Results
</H1> r :=3D Response{structured=3D"<H1>Search..."};
<A HREF=3D"x1">r1>/A>
</PRE>
The "1.0" style transaction is more involved... it looks like:
<PRE>
INTERFACE MIME;
TYPE Body =3D TEXT;
TYPE BaseType =3D { text, audio, image, video,
application, message, multipart, extension };
TYPE SubType =3D { plain, enriched, basic, gif, jpeg, mpeg,
octet_stream, postscript,
rfc822, partial, external_body,
alternative, mixed, parallel,
extension };
TYPE ContentType =3D OBJECT
base : BaseType;
xbase : TEXT;
sub : SubType;
xsub: TEXT;
parameters : REF ARRAY OF RECORD
name : TEXT;
value : TEXT;
END;
TYPE BodyPart =3D OBJECT
headers =3D REF ARRAY OF RECORD
name : TEXT;
value : TEXT;
END;
body : Body;
METHODS
contentType() : ContentType;
decode() : TEXT; (* undoes Content-Transfer-Encoding *)
END;
END MIME.
INTERFACE HTTP1_0;
IMPORT MIME;
TYPE SearchWords =3D REF ARRAY OF TEXT;
TYPE Method =3D {'GET'};
TYPE Request =3D RECORD
method : Method;
resource : TEXT;
search : SearchWords;
aux : MIME.BodyPart;
TYPE Version =3D [0..999];
TYPE StatusCode =3D [0..999];
TYPE Line =3D TEXT; (* with no CR or LF chars *)
TYPE Response =3D RECORD
version : Version;
code : StatusCode;
reason : Line; =
object : MIME.BodyPart;
END;
END.
</PRE>
Whew... that was a fun excercise. I'm not sure what the point of it
was, except to attempt to specify HTTP at a more abstract level than
sequences of characters. <P>
A minimal HTTP client must understand the text/plain and text/html
content types, plus the message/rfc822 head/body syntax. Unlike a
minimally conforming MIME user agent, it is not required to understand
any encodings (base64 and quoted-printable are required for MIME UA's)
or multipart content types (MIME UAs must understand the boundary
syntax). Fair enough... it matches historical usage.<P>
It seems that there are several of servers out there that will take
mail messages/news articles and serve them up as HTML. This makes
sense if they're going to add hyperlinking markup. On the other hand,
it makes a lot of sense for smart WWW clients to understand
RFC822/MIME syntax in all its glory (with the built-in ability to
recognize references, etc.), given the amount of data available in
this format (from NTTP servers, WAIS servers of mail archives, etc....
unfortunately, I think a lot of WAIS servers label news articles as
TEXT, rather than message/rfc822)<P>
What's the direction in this area? I for one would like to see HTML
and RFC822/MIME as alternate representations for roughly the same
structure of information. More on that later...<P>
<H3>References</H3>
<PRE>
<A NAME=3D"rfc1521"
HREF=3D"ftp://ds.internic.net/rfc/rfc1521.txt">
1521</A> DS N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail=
=
Extensions) Part One: Mechanisms for Specifying and Describing=
the =
Format of Internet Message Bodies", 09/23/1993. (Pages=3D81) =
(Format=3D.txt, .ps) (Obsoletes RFC1341) (Updated by RFC1590) =
<A NAME=3D"HTTP"
HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html">
HTTP: A protocol for networked information</A>
</PRE>
------- =_aaaaaaaaaa0--