Re: proposals for log file format changes

"Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
Date: Thu, 10 Feb 1994 11:40:28 --100
Message-id: <>
Precedence: bulk
From: "Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
To: Multiple recipients of list <>
Subject: Re: proposals for log file format changes 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 3429
Kevin Hughes said:

> 	More or less following RFC 822, then:
> host rfc931 authuser [DD/Mon/YYYY:hh:mm:ss UT[+/-]HHMM] "request" ddd bbbb
> 	How's that?

RFC 822 expects date fields to be separated by spaces.

Now that people are talking about including the Referer: field
(a great idea but a lot of text per log entry), I think the original idea
of a configurable log is now preferable in order to save some people's disks.
However, I would recommend a limited set of options rather than the
fully formattable sscanf codes that Ari first mentioned.
[I think Kevin suggested option names as well, but I didn't save that message].

How about:

     host     = machine.sub.dom.ain
     rfc931   = whatever_it_returns
     fromuser = whatever_From:_gives          (stripped of comments)
     authuser = whatever_Authorization:_gives (stripped of password)
     authpass = whatever_Authorization:_gives (stripped of user - IS THIS SAFE?)
     charge   = whatever_ChargeTo:_gives
     locdate  = [DD Mon YYYY hh:mm:ss]
     gmtdate  = [DD Mon YYYY hh:mm:ss GMT]
     tzdate   = [DD Mon YYYY hh:mm:ss +HHMM]
     request  = "first line from HTTP request"
     response = ddd   (3 digit HTTP response code)
     bytes    = bbbbb (free-formatted number of bytes transmitted)
     referer  = the_referer's_URI

As specified in HTTP2, the From: field looks like an e-mail address.
Should the entire address be logged or just the username?  If only username,
how does the server parse it given the wide variety of address formats?

The next question is: should the order be configurable as well?
If not, then the format can be specified by simple boolean options.
However, I'll bet people will want it configurable.  In that case,
how should it be specified?  A list is probably best, placed in a
server config file (e.g. NCSA's srm.conf).  E.g.:


Any field which is requested but is not defined for a particular log entry
should be logged as a single dash "-".

Another question is how should the fields be separated in the log?
Current practice uses a space, but perhaps a comma is better.  Any field
which could possibly include the delimiter would have to be surrounded
by some form of brackets (as is the date and request fields above).

    Some examples:


    would log something like: [10 Feb 1994 01:18:51] "GET /ICShome.html HTTP/1.0" 200 4262


    would log something like:

[10 Feb 1994 09:18:51 GMT] 200 "GET /ICShome.html HTTP/1.0" 4262

    My primary concern about this is the extra work it will require of
the server authors.  Provided that the fields are well defined and can
be parsed unambiguously, there should be no problem for log analyzers.
However, I think it would be much easier on the server authors if the field
order is fixed and simple options defined, e.g.:

LogDate LOCAL       (or GMT or TIMEZONE)
LogReferer NO       (or YES)

I think that decision should be left up to the server authors.


...Roy Fielding   ICS Grad Student, University of California, Irvine  USA
    <A HREF="">About Roy</A>