Comments on HTTP spec

"Daniel W. Connolly" <connolly@hal.com>

Mail folder: WWW Talk Jan 94-present
Next message: michael shiplett: "Re: Insecure WWW Access Authorization Protocol? "
Previous message: Tony Sanders: "Re: Insecure WWW Access Authorization Protocol? "

Errors-To: listmaster@www0.cern.ch
Date: Wed, 9 Mar 1994 21:20:10 --100
Message-id: <9403092007.AA05704@ulua.hal.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Comments on HTTP spec
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 18676

------- =_aaaaaaaaaa0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <5699.763243624.2@ulua>
Content-Description: HTTP comments as text

                                                          Comments on HTTP spec
Mime Terms

   The HTTP spec[1] makes many references to MIME and related terms, but it
   seems to use the terms loosely and incorrectly in many cases:
   
   Some useful terms are:
   
  Body Part               Headers, a blank line, and a body
                         
   Body                   a sequence of octets obeying a content transfer
                         encoding implied by context (such as the
                         Content-Transfer-Encoding header in the enclosing body
                         part).
                         
   Perhaps a better way to explain it is:
   

text/plain      akindof Body    (just a plain text data stream)
image/gif       akindof Body    (ordinary GIF data stream)
message/rfc822  akindof Body    (data stream defined by RFC822)
message/rfc822  akindof Body Part (wierd, huh?)
Body Part       hasa    Head
Body Part       hasa    Body
Head            hasa    Content-Transfer-Encoding
Head            hasa    Content-Type
multipart/mixed isa     Body
multipart/mixed hasa    sequence of Body Parts

   From  HTTP: The Request[2]:
   

        FullRequest       =     Method UR ProtocolVersion CrLf
                                [*<HTRQ Header>]
                                [<CrLf> <data>]

        <Method>          =     <InitialAlpha>

        ProtocolVersion   =     HTTP/1.0

        uri               =     <as defined in URL spec>

        <HTRQ Header>     =     <Fieldname> : <Value> <CrLf>

        <data>            =      MIME-conforming-message

   What is "MIME-conforming-message"? The term itself suggests that it means an
   body of type message/rfc822, but the usage in the grammar says it's just a
   generic body. Using the grammar of the MIME spec, RFC1521[3], this should be
   written:
   

        FullRequest       =     Method UR ProtocolVersion CrLf
                                [body-part]

        <Method>          =     <InitialAlpha>

        ProtocolVersion   =     HTTP/1.0

        uri               =     <as defined in URL spec>

   where body-part should be defined similarly to the way it's done in MIME,
   i.e. some variation on the following:
   

   body-part := <"message" as defined in RFC 822,
             with all header fields optional, and with the
             specified delimiter not occurring anywhere in
             the message body, either on a line by itself
             or as a substring anywhere.  Note that the
             semantics of a part differ from the semantics
             of a message, as described in the text.>

Accept Field

   Hmmm... in the Accept: field: is it comma or semicolot separated? Contrast:
   
       This field contains a semicolon-separated list of representation sc
     hemes ( Content-Type metainformation values) which will be accepted i
     n the response to this request.
     
   with this example:
   

Accept: text/plain, text/html
Accept: text/x-dvi; q=.8; mxb=100000; mxt=5.0, text/x-c

   and then there's this example:
   

Accept:  *.*, q=0.1
Accept:  audio/*, q=0.2
Accept:  audio/basic q=1

Default Content-Transfer-Encoding

   From Comments on HTTP spec[4]
   
       The data (if any) sent with an HTTP request or reply is in a format
      and encoding defined by the object header fields, the default being
     "plain/text" type with "8bit" encoding. Note that while all the other
      information in the request (just as in the reply) is in ISO Latin1 w
     ith lines delimited by Carriage Return/Line Feed pairs, the data may
     contain 8-bit binary data.
     
   The 8bit encoding means characters can be in the range 0-255, but the
   characters are still arranged as lines of text not to exceed 76 characters,
   delimited by CRLF pairs.
   
   The binary encoding meands an arbitrary stream of octets. I expect this is
   what the HTTP designers had in mind for the default
   content-transfer-encoding... Hmmm... or perhaps not... perhaps sending
   arbitrary binary data is somewhat at odds with historical usage of HTTP, and
   so binary data should be labelled as such. This could use some clarification
   in any case.
   
The libWWW implementation

   I browsed around the libWWW implementation, and I'm somewhat confused:
   
   First of all what is "www/mime"? It appears to work like message/rfc822, but
   then stuff labelled message/rfc822 seems to be handed off to MetaMail!!!
   
   From what I can tell, there are two classes of HTTP transactions:
   
   The "0.9" style transaction could be described by the following Modula-like
   interface:
   

INTERFACE HTTP0_9;
        TYPE HTML = TEXT;
        TYPE SearchWords = REF ARRAY OF TEXT;
        TYPE Request = RECORD
                resource : TEXT;
                search : SearchWords = NIL;
             END;
        TYPE Response = RECORD
                structured : HTML;
                plain : TEXT = NIL;
             END;
END HTTP0_9.

   For example,
   

HTTP                                            Modula
Client:
        GET /foo/bar.html               req = Request{resource="/foo/bar.html"}
;
Server:
        <PLAINTEXT>                     resp := Response{
        Four score and seven years ago          structured="<PLAINTEXT>",
        today...                                plain="Four score and seven..."
};

   or...
   

HTTP                                            Modula
Client:
        GET /foo?a+b                    words := NEW(ARRAY[3] OF TEXT);
                                        words[0] := "a"; words[1] := "b";
                                        req = Request{resource="/foo",
                                                         search = words);
Server:
        <H1>Search Results
        </H1>                           r := Response{structured="<H1>Search...
"};
        <A HREF="x1">r1>/A>

   The "1.0" style transaction is more involved... it looks like:
   

INTERFACE MIME;
        TYPE Body = TEXT;

        TYPE BaseType = { text, audio, image, video,
                         application, message, multipart, extension };
        TYPE SubType = { plain, enriched, basic, gif, jpeg, mpeg,
                        octet_stream, postscript,
                        rfc822, partial, external_body,
                        alternative, mixed, parallel,
                        extension };

        TYPE ContentType = OBJECT
                base : BaseType;
                xbase : TEXT;
                sub : SubType;
                xsub: TEXT;
                parameters : REF ARRAY OF RECORD
                        name : TEXT;
                        value : TEXT;
                END;

        TYPE BodyPart = OBJECT
                        headers = REF ARRAY OF RECORD
                                name : TEXT;
                                value : TEXT;
                        END;
                        body : Body;
                METHODS
                        contentType() : ContentType;
                        decode() : TEXT; (* undoes Content-Transfer-Encoding *)
                END;
END MIME.

INTERFACE HTTP1_0;
        IMPORT MIME;

        TYPE SearchWords = REF ARRAY OF TEXT;

        TYPE Method = {'GET'};

        TYPE Request = RECORD
                method : Method;
                resource : TEXT;
                search : SearchWords;
                aux : MIME.BodyPart;

        TYPE Version = [0..999];
        TYPE StatusCode = [0..999];
        TYPE Line = TEXT; (* with no CR or LF chars *)

        TYPE Response = RECORD
                version : Version;
                code : StatusCode;
                reason : Line;
                object : MIME.BodyPart;
             END;
END.

   Whew... that was a fun excercise. I'm not sure what the point of it was,
   except to attempt to specify HTTP at a more abstract level than sequences of
   characters.
   
   A minimal HTTP client must understand the text/plain and text/html content
   types, plus the message/rfc822 head/body syntax. Unlike a minimally
   conforming MIME user agent, it is not required to understand any encodings
   (base64 and quoted-printable are required for MIME UA's) or multipart
   content types (MIME UAs must understand the boundary syntax). Fair enough...
   it matches historical usage.
   
   It seems that there are several of servers out there that will take mail
   messages/news articles and serve them up as HTML. This makes sense if
   they're going to add hyperlinking markup. On the other hand, it makes a lot
   of sense for smart WWW clients to understand RFC822/MIME syntax in all its
   glory (with the built-in ability to recognize references, etc.), given the
   amount of data available in this format (from NTTP servers, WAIS servers of
   mail archives, etc.... unfortunately, I think a lot of WAIS servers label
   news articles as TEXT, rather than message/rfc822)
   
   What's the direction in this area? I for one would like to see HTML and
   RFC822/MIME as alternate representations for roughly the same structure of
   information. More on that later...
   
  REFERENCES
  

1521[5]  DS   N. Borenstein, N. Freed, "MIME  (Multipurpose Internet Mail
           Extensions) Part One:  Mechanisms for Specifying and Describing the
           Format of Internet Message Bodies", 09/23/1993. (Pages=81)
           (Format=.txt, .ps) (Obsoletes RFC1341) (Updated by RFC1590)

 HTTP: A protocol for networked information[6]

   

------- =_aaaaaaaaaa0
Content-Type: text/x-html; charset="us-ascii"
Content-ID: <5699.763243624.3@ulua>
Content-Description: HTTP comments as html
Content-Transfer-Encoding: quoted-printable

<HEAD>
<TITLE>Comments on HTTP spec</TITLE>
</HEAD>
<body>

<H2>Mime Terms</H2>

<A HREF=3D"#HTTP">The HTTP spec</A> makes many references to MIME and
related terms, but it seems to use the terms loosely and incorrectly
in many cases: <P>

Some useful terms are:
<DL>
<DT> Body Part
<DD> Headers, a blank line, and a body
<DT> Body
<DD> a sequence of octets obeying a content transfer encoding implied
by context (such as the Content-Transfer-Encoding header in the
enclosing body part).
</DL>

Perhaps a better way to explain it is:

<PRE>
text/plain	akindof	Body	(just a plain text data stream)
image/gif	akindof	Body	(ordinary GIF data stream)
message/rfc822	akindof	Body	(data stream defined by RFC822)
message/rfc822	akindof	Body Part (wierd, huh?)
Body Part	hasa	Head
Body Part	hasa	Body
Head		hasa	Content-Transfer-Encoding
Head		hasa	Content-Type
multipart/mixed	isa	Body
multipart/mixed	hasa	sequence of Body Parts
</PRE>


=46rom <A HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/Request=
.html">
HTTP: The Request</A>:

<BLOCKQUOTE>
<PRE>
        FullRequest       =3D     Method UR ProtocolVersion CrLf
                                [*&lt;HTRQ Header&gt;]
                                [&lt;CrLf&gt; &lt;data&gt;]

        &lt;Method&gt;          =3D     &lt;InitialAlpha&gt;

        ProtocolVersion   =3D     HTTP/1.0

        uri               =3D     &lt;as defined in URL spec&gt;

        &lt;HTRQ Header&gt;     =3D     &lt;Fieldname&gt; : &lt;Value&gt; =
&lt;CrLf&gt;

        &lt;data&gt;            =3D      MIME-conforming-message        =

</PRE>
</BLOCKQUOTE>

What is "MIME-conforming-message"? The term itself suggests that it
means an body of type message/rfc822, but the usage in the grammar
says it's just a generic body. Using the grammar of the MIME spec,
<A HREF=3D"#rfc1521">RFC1521</A>, this should be written:

<PRE>
        FullRequest       =3D     Method UR ProtocolVersion CrLf
				[body-part]

        &lt;Method&gt;          =3D     &lt;InitialAlpha&gt;

        ProtocolVersion   =3D     HTTP/1.0

        uri               =3D     &lt;as defined in URL spec&gt;
</PRE>

where <CODE>body-part</CODE> should be defined similarly to the way
it's done in MIME, i.e. some variation on the following:

<PRE>
   body-part :=3D &lt;"message" as defined in RFC 822,
             with all header fields optional, and with the
             specified delimiter not occurring anywhere in
             the message body, either on a line by itself
             or as a substring anywhere.  Note that the
             semantics of a part differ from the semantics
             of a message, as described in the text.&gt;

</PRE>

<H2>Accept Field</H2>

Hmmm... in the Accept: field: is it comma or semicolot separated?
Contrast:

<BLOCKQUOTE>
This field contains a semicolon-separated list of representation schemes (=
 Content-Type metainformation values)
which will be accepted in the response to this request.
</BLOCKQUOTE>

with this example:

<BLOCKQUOTE>
<PRE>
Accept: text/plain, text/html
Accept: text/x-dvi; q=3D.8; mxb=3D100000; mxt=3D5.0, text/x-c
</PRE>
</BLOCKQUOTE>

and then there's this example:

<BLOCKQUOTE>
<PRE>
Accept:  *.*, q=3D0.1
Accept:  audio/*, q=3D0.2
Accept:  audio/basic q=3D1
</PRE>
</BLOCKQUOTE>

<H2>Default Content-Transfer-Encoding</H2>

=46rom Comments on <A
HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/Body.html">HTTP
spec</A>

<BLOCKQUOTE>
The data (if any) sent with an HTTP request or reply is in a format
and encoding defined by the object header fields, the default being
"plain/text" type with "8bit" encoding. Note that while all the other
information in the request (just as in the reply) is in ISO Latin1
with lines delimited by Carriage Return/Line Feed pairs, the data may
contain 8-bit binary data.
</BLOCKQUOTE>

The 8bit encoding means characters can be in the range 0-255, but the
characters are still arranged as lines of text not to exceed 76
characters, delimited by CRLF pairs. <P>

The binary encoding meands an arbitrary stream of octets. I expect
this is what the HTTP designers had in mind for the default
content-transfer-encoding... Hmmm... or perhaps not... perhaps sending
arbitrary binary data is somewhat at odds with historical usage of
HTTP, and so binary data should be labelled as such. This could use
some clarification in any case.<P>

<H2>The libWWW implementation</H2>

I browsed around the libWWW implementation, and I'm somewhat confused:
<P>

First of all what is "www/mime"? It appears to work like
message/rfc822, but then stuff labelled message/rfc822 seems to be
handed off to MetaMail!!! <P>

=46rom what I can tell, there are two classes of HTTP transactions: <P>

The "0.9" style transaction could be described by the following
Modula-like interface:

<PRE>
INTERFACE HTTP0_9;
	TYPE HTML =3D TEXT;
	TYPE SearchWords =3D REF ARRAY OF TEXT;
	TYPE Request =3D RECORD
		resource : TEXT;
		search : SearchWords =3D NIL;
	     END;
	TYPE Response =3D RECORD
		structured : HTML;
		plain : TEXT =3D NIL;
	     END;
END HTTP0_9.
</PRE>

For example,

<PRE>
HTTP						Modula
Client:
	GET /foo/bar.html		req =3D Request{resource=3D"/foo/bar.html"};
Server:
	&lt;PLAINTEXT&gt;			resp :=3D Response{
	Four score and seven years ago		structured=3D"&lt;PLAINTEXT&gt;",
	today...				plain=3D"Four score and seven..."};
</PRE>

or...

<PRE>
HTTP						Modula
Client:
	GET /foo?a+b			words :=3D NEW(ARRAY[3] OF TEXT);
					words[0] :=3D "a"; words[1] :=3D "b";
					req =3D Request{resource=3D"/foo",
							 search =3D words);
Server:
	&lt;H1&gt;Search Results
	&lt;/H1&gt;				r :=3D Response{structured=3D"&lt;H1&gt;Search..."};
	&lt;A HREF=3D"x1"&gt;r1&gt;/A&gt;
</PRE>

The "1.0" style transaction is more involved... it looks like:

<PRE>
INTERFACE MIME;
	TYPE Body =3D TEXT;

	TYPE BaseType =3D { text, audio, image, video,
			 application, message, multipart, extension };
	TYPE SubType =3D { plain, enriched, basic, gif, jpeg, mpeg,
			octet_stream, postscript,
			rfc822, partial, external_body,
			alternative, mixed, parallel,
			extension };

	TYPE ContentType =3D OBJECT
		base : BaseType;
		xbase : TEXT;
		sub : SubType;
		xsub: TEXT;
		parameters : REF ARRAY OF RECORD
			name : TEXT;
			value : TEXT;
		END;

	TYPE BodyPart =3D OBJECT
			headers =3D REF ARRAY OF RECORD
				name : TEXT;
				value : TEXT;
			END;
			body : Body;
		METHODS
			contentType() : ContentType;
			decode() : TEXT; (* undoes Content-Transfer-Encoding *)
		END;
END MIME.

INTERFACE HTTP1_0;
	IMPORT MIME;

	TYPE SearchWords =3D REF ARRAY OF TEXT;

	TYPE Method =3D {'GET'};

	TYPE Request =3D RECORD
		method : Method;
		resource : TEXT;
		search : SearchWords;
		aux : MIME.BodyPart;

	TYPE Version =3D [0..999];
	TYPE StatusCode =3D [0..999];
	TYPE Line =3D TEXT; (* with no CR or LF chars *)

	TYPE Response =3D RECORD
		version : Version;
		code : StatusCode;
		reason : Line;		=

		object : MIME.BodyPart;
	     END;
END.
</PRE>

Whew... that was a fun excercise. I'm not sure what the point of it
was, except to attempt to specify HTTP at a more abstract level than
sequences of characters. <P>

A minimal HTTP client must understand the text/plain and text/html
content types, plus the message/rfc822 head/body syntax. Unlike a
minimally conforming MIME user agent, it is not required to understand
any encodings (base64 and quoted-printable are required for MIME UA's)
or multipart content types (MIME UAs must understand the boundary
syntax). Fair enough... it matches historical usage.<P>

It seems that there are several of servers out there that will take
mail messages/news articles and serve them up as HTML. This makes
sense if they're going to add hyperlinking markup. On the other hand,
it makes a lot of sense for smart WWW clients to understand
RFC822/MIME syntax in all its glory (with the built-in ability to
recognize references, etc.), given the amount of data available in
this format (from NTTP servers, WAIS servers of mail archives, etc....
unfortunately, I think a lot of WAIS servers label news articles as
TEXT, rather than message/rfc822)<P>

What's the direction in this area? I for one would like to see HTML
and RFC822/MIME as alternate representations for roughly the same
structure of information. More on that later...<P>

<H3>References</H3>
<PRE>
<A NAME=3D"rfc1521"
HREF=3D"ftp://ds.internic.net/rfc/rfc1521.txt">
1521</A>  DS   N. Borenstein, N. Freed, "MIME  (Multipurpose Internet Mail=
  =

           Extensions) Part One:  Mechanisms for Specifying and Describing=
 the =

           Format of Internet Message Bodies", 09/23/1993. (Pages=3D81) =

           (Format=3D.txt, .ps) (Obsoletes RFC1341) (Updated by RFC1590) =


<A NAME=3D"HTTP"
HREF=3D"http://info.cern.ch/hypertext/WWW/Protocols/HTTP/HTTP2.html">
HTTP: A protocol for networked information</A>
</PRE>

------- =_aaaaaaaaaa0--