libWWW: New architecture
Tim Berners-Lee <timbl@www3.cern.ch>
Date: Thu, 11 Feb 93 18:55:44 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9302111755.AA05809@www3.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: libWWW: New architecture
Reply-To: timbl@nxoc01.cern.ch
I am testing out the new library (2.alpha?) which has a lot of
new features largely as a result comment on this list (many from
Dan Connoly, but also others). So I thought I'd bounce the end result
off you all as to check it for stupidities.
The driving forces are putting in MIME, and allowing the library
to be used by disparate browsers and editors and servers without
modification.
The thing is still all portable C but even more OO in style. Two new
objects are HTStream which is something you can write to .. it
supports
put_character
put_string
write /* buffer */
end
free
Some of the machinery works character by character
(state machine parsers) but it's useful to have faster methods when
there is no characterwise intervention.
This is to allow MIME encoding pipelines to be built, as well as the
existing facilities to be more flexible. Streams stack, and freeing
the top of the stack frees off the stack. As a side issue the stream
idea allows the difference between sockets and FILE * to be overcome
cleanly without the hacks necessary on some systems which shall be
nameless. Streams subclasses exist to
write to a FILE *
write to a socket
parse an SGML file of (dtd)
pushing the results into (structured)
convert plain text to valid HTML -> other stream
format a (structured) described by (dtd) as plain text
pushing the results -> other stream
parse a MIME document (to come)
There is a different creation routine for each case.
The "structured" object is a subclass of HTStream called
HTStructured. It is a sort of rich stream, accepting also the methods
start_element(element_number, attributes)
end_element(elemenet_number)
put_entity(entity_number)
Obviously an HTStructured thing has a pointer to a DTD structure
so that the element numbers and entity numbers make sense.
Elements and entities are passed around as numbers so that the
string lookup is only done once if at all.
Structued subclasses objects exist to
Pretty-print structued text to plain text stream
Generate SGML for the stream, given its DTD
Present the structured object to the user
This last one is the one you browser writers hook into
and it should be alot easier than tangling with styles.
You can also of course regenerate the stream from your
widget and use the HTML generator structured object
to write your file back to the server. (Thisis to
encourage more hypertext editors out there!)
There is another class of object called a protocol.
It supports GET and will soon support PUT and other
useful things. Current subclasses handle
http
news
gopher
files and directories
FTP files and directories
WAIS without gateway will come
Typically News and Gophe protocol
objects ask for a HTStructured object (which may
in fact go to a display, or a client if we are a server, or
a file etc through a small stream stack) and build it.
Although the structured object is defined by a DTD,
and has an SGML model, there is no generation of SGML from
news, etc within a browser for speed (unless the user wants
to save something as SGML).
Other things .. a common interface for alerts, confirmation and
questions to the user from the bowels of the library, and sometime a
spinning callback for geting out of
those long timeouts..
Constructive criticism welcome, I'll put the code out
when I can.
Tim Berners-Lee