RE server announcements

Terry Allen <terry@ora.com>

Mail folder: WWW Talk Jan 94-present
Next message: Daniel W. Connolly: "Re: RE server announcements "
Previous message: Daniel W. Connolly: "Re: RE Machine-readable server announcements "

Errors-To: listmaster@www0.cern.ch
Date: Wed, 9 Mar 1994 01:47:57 --100
Message-id: <199403090042.AA24371@rock.west.ora.com>
Errors-To: listmaster@www0.cern.ch
Reply-To: terry@ora.com
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Terry Allen <terry@ora.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: RE server announcements
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 4643

| >I think I missed something.  Dan,
| >is there anything about the format of the message that
| >describes its content as a server announcement?
| 
| Only the Newsgroups, Mime-Version, and Content-Type headers.

Thanks, that was what I was after.  Which of the Content-Type
lines in your example is relevant here, message/rfc822,
multipart/alternative; boundary="cut-here", or text/x-html?

| >Or would any message sent to the server-announcement 
| >newsgroup that contained a URL be (mis)interepreted
| >as a server announcement?
| I expect any message sent to the server-announcement newsgroup would
| be interpreted as a server announcement... is there some reason not
| to? The question is: who does the "interpreting." Plain text messages
| are fine for interpretation by humans, but not so great for automated
| newsreaders.
| The feature I'm after is a reliable way to extract URLs from these
| announcements. 
[ . . .]
| The object of the game is that (1) we settle on a format or a small
| number of formats and register them with the IANA as MIME
| content-types (this may already be the case for wais-sources... HTML
| is headed that direction). (2) folks use those formats to distribute
| announcements (and label them as such using MIME headers), and
| finally, (3) other folks have well-defined ways to extract resource
| pointers from announcements. They may choose to (4) stick the
| announcement in a fulltext indexed database for local resource
| discovery.
| As for the SGML/MIME stuff... I'm also interested in expressing lots
| of other sentiments in a machine-readable way. Such things as:
| 	"This data is also mirrored at the following sites..."
| 	"The latest version is always available from ..."
| 		(caching, replication)
| 	"The document is available as text, postscript..."
| 		(format conversion/negotiation)
| 	"This text was written by Daniel W. Connolly on March 8, 1994"
| 		(digital signatures)
| 	"The following is a quote from document X, as of March 1, 1994..."
| 		(verifiable links)
| 	"Only folks that have a license to this data can read it"
| 		(authentication, authorization)
[ . . .]
| For example, there's no handy way to validate an HTML document, since
| most of them have an instance with no hint of a prologue. This is
| largely due to some bad decisions I made a year or so ago... I was
| naive enough to expect that we'd all agree on the same DTD. Not in
| this lifetime :-{

You have to assume the DTD is the official HTML DTD, not some local
variant; this is what the browsers assume anyway.  The issue has 
been muddied because the HTML DTD initially distributed didn't 
work well, leading to local fixes, and new stuff from HTML+ has leaked
into browser functionality, necessitating local updates.
Users want to use the full display ability of the
browsers they use, and browser developers haven't waited
on an official revision.

For the case of multiple DTDs, a prologue is necessary, though.
if the DTD is public (and useable) you can refer to it through a
Formal Public Identifier in the DOCTYPE declaration.  Seems to
me that the offical HTML DTD should be considered the default DTD if 
none is specified in the instance. 
		
| HTML serves the needs of simple situations like campus-wide
| information systems pretty well. But imagine preparing a hypertext
| legal briefing: you'd want to be SURE that the document you link to
| don't change out from under you (or at least that you can tell if it
| does...). You might be willing to pay to get access to documents... you
| might pay more for better indexing... you might pay a hypertext
| librarian to organize the documents you have access to with respect to
| a particular vertical market...
| You might think this is far-fetched, but there are already seeds of
| electronically distributed research journals using WWW and other
| Internet tools. From there, it will bleed into entertainment, news
| media, etc.

Browsers that read arbitrary DTDs are on their way.  It seems to me
that what you are pursuing (rightly) is a well defined set of info
that should be accomodated by any DTD that claims to be useful
for hypertext, along with another well defined set of info that
is to be supplied in the MIME wrapper when the SGML instance
is served.  The first part might resolve into some set of 
"architectural forms," that is, attributes with #FIXED values
that can be used in, even retrofitted to, any DTD that actually
has the appropriate info (such as an <AUTHOR> tag).

Do I read you correctly?

Regards,

-- 
Terry Allen  (terry@ora.com)
Editor, Digital Media Group
O'Reilly & Associates, Inc.
Sebastopol, Calif., 95472