libWWW: New architecture

Tim Berners-Lee <timbl@www3.cern.ch>
Date: Thu, 11 Feb 93 18:55:44 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9302111755.AA05809@www3.cern.ch>
To: www-talk@nxoc01.cern.ch
Subject: libWWW: New architecture
Reply-To: timbl@nxoc01.cern.ch


I am testing out the new library (2.alpha?) which has a lot of
new features largely as a result comment on this list (many from
Dan Connoly, but also others). So I thought I'd bounce the end result  
off you all as to check it for stupidities.

The driving forces are putting in MIME, and allowing the library
to be used by disparate browsers and editors and servers without
modification.

The thing is still all portable C but even more OO in style. Two new  
objects are HTStream which is something you can write to .. it  
supports

	put_character
	put_string
	write 	/* buffer */
	end
	free

Some of the machinery works character by character
(state machine parsers) but it's useful to have faster methods when
there is no characterwise intervention.

This is to allow MIME encoding pipelines to be built, as well as the
existing facilities to be more flexible.  Streams stack, and freeing
the top of the stack frees off the stack.  As a side issue the stream
idea allows the difference between sockets and FILE * to be overcome
cleanly without the hacks necessary on some systems which shall be  
nameless. Streams subclasses exist to

	write to a FILE *
	write to a socket
	parse an SGML file of (dtd)
		pushing the results into (structured)
	convert plain text to valid HTML -> other stream
	format a (structured) described by (dtd) as plain text
		pushing the results -> other stream
	parse a MIME document (to come)

There is a different creation routine for each case.
The "structured" object is a subclass of HTStream called  
HTStructured. It is a sort of rich stream, accepting also the methods

	start_element(element_number, attributes)
	end_element(elemenet_number)
	put_entity(entity_number)

Obviously an HTStructured thing has a pointer to a DTD structure
so that the element numbers and entity numbers make sense.
Elements and entities are passed around as numbers so that the
string lookup is only done once if at all.

Structued subclasses objects exist to

	Pretty-print structued text to plain text stream
	Generate SGML for the stream, given its DTD
	Present the structured object to the user

This last one is the one you browser writers hook into
and it should be alot easier than tangling with styles.

You can also of course regenerate the stream from your
widget and use the HTML generator structured object
to write your file back to the server. (Thisis to
encourage more hypertext editors out there!)

There is another class of object called a protocol.
It supports GET and will soon support PUT and other
useful things.  Current subclasses handle

	http
	news
	gopher
	files and directories
	FTP files and directories
	WAIS without gateway will come

Typically News and Gophe protocol
objects ask for a HTStructured object (which may
in fact go to a display, or a client if we are a server, or
a file etc through a small stream stack) and build it.
Although the structured object is defined by a DTD,
and has an SGML model, there is no generation of SGML from
news, etc within a browser for speed (unless the user wants
to save something as SGML).

Other things .. a common interface for alerts, confirmation and  
questions to the user from the bowels of the library,  and sometime a  
spinning callback for geting out of
those long timeouts..

Constructive criticism welcome, I'll put the code out
when I can.

Tim Berners-Lee