Re: Embedding into HTML [was signature/encryption tags]

Dianne Hackborn (hackbod@mail.CS.ORST.EDU)
Thu, 13 Apr 95 04:09:14 EDT

On Wed, 12 Apr 1995, Craig Hubley wrote:

> Yes. As is directly embedding any kind of information which is not intended
> for human consumption but for machine use (a filename is about the only real
> crossover I can think of). All of this, to quote myself:

[deleted.]

Okay, this is the impression I've been getting.

> My intent was to invoke the (implied) SGML principle of human-readable and
> human-editable information being the only legal content of an SGML document.
> By having no mandatory binary or otherwise non-ASCII-encoded information it
> has been possible for ASCII editors to generate SGML documents with only a
> template. A major factor in its success.

Hmmm... though in this particular case, I could see a script as being human
readable and editable. OTOH, my implementation theoretically allows for
other languages; I could well see a binary form being used.

> We should discuss what the SGML rules are first... can we find SGML DTDs
> (other than HTML) that include tags that have:

Unfortunately, way out of my knowledge base, which is why I am asking. :)

> > So, it contains zero or more <MODULE> tags, which declare all the external
> > modules needed by the program, and <SOURCE>, whose content is the actual
> > script which will be executed. [A bit more detailed description of this is
>
> An SGML-based 'make' replacement ? Intriguing... that makes it possible to
> have SGML-based 'rcs' replacements too... very intriguing.

Hmm... actually, I haven't at all been thinking about it in that context.
First, to provide some background, Python [the language I am currently
using as the interpreter] allows external code packages to be imported as a
module. What this is intended to do is allow the document to import
modules from external sites as a URL; this allows, for example, someone to
provide a library of routines at some URL which other people can link to
and invisibly use. For example, someone may make a module with interactive
mapping class available. A document [or another URL] could then reference
that and create a subclass based on its functionality.

The idea is to provide a somewhat organic way for new capabilities to be
added to browsers. Ideally, you would have a four-step process:

1. Someone writes a class for their personal use.
2. At some point, this person may want to clean up his class and
make it publically available to others from some URL.
3. As this class comes into more widespread use, it is used a lot
and people want to have it available locally. It is possibly
cleaned up again, given an official name by some standards body,
and can now be included as a browser's set of standard scripts.
This script is also available at a public URL for those sites
which don't yet or don't want to store it locally.
4. Finally, this script may at some point be downcoded into C if it
needs the speed and/or is becoming an integral part of documents.
At this point a browser may either provide it as a compiled module,
load a local script, or retrieve it from a supplied URL when some
document needs it.

One reason that I am currently working with Python is that it seems to have
a fairly good chance of meeting that last step -- it's relatively easy to
write extensions in C or C++, and these extensions look almost identical,
to the Python-script point of view, as any normal module written in Python.

So, this is probably more like a dynamic linker than a 'make' replacement.
There is some sense of versioning, but that is needed so that the browser
can know to retrieve some standard module from the URL if the locally
available version is older than what the document needs.

I am really curious about what you exactly mean when you refer to a 'make'
replacement... sounds interesting. :)

> > at <http://www.cs.orst.edu/~hackbod/exechtml/>, which is anextremely
> > preliminary design, but it's something. :)]
>
> Does it refer to other work in this area ?

Not yet; it's actually primarily there at this point to give some people
who we've done presentations with somewhere to get a bit more detailed
information.

> I think embedded programs of a few lines make sense in many cases, if they
> are human-editable I don't see any reason why they couldn't be SGML tags.
> An HTML anchor is one such. SGML HyTime may specify some guidelines here
> as that committee was pretty thorough about active and time-based data...

Is any of their information available? I would be very interested in
looking at their thoughts.

> It is only of interest if browsers *must* execute the code to get the
> intent of the page across... for instance if it specifies transformations
> on the rest of the page that make them sensible (e.g. decrypting graphics).

Actually, yes, this is one of the important things it does. Scripts are
able to create new HTML trees and display them -- i.e.,

myTree = html.NewHTMLTree("<H1>Hi mom!</H1>I am so bored...")
_document_.CurHTMLTree = myTree # set tree to display
_document_.Reformat() # refresh display

The current implementation also allows HTML trees -- including the one
initially parsed from the document's body -- to be manipulated in various
ways. This allows the script to construct trees with information which it
computed on the client machine. For example, it may want to insert
different <IMG> and <FIG> elements depending on what kind of display it
finds that the browser is running on, or massage the document in some way
if it finds the browser doesn't support some new HTML feature.

In fact, when the current implementation finds that a document contains a
script, it doesn't display the document; the script must at least call
Reformat() to get the document displayed. This allows the script to do any
setup it needs before anything is displayed.

And actually, this is probably an even bigger subject that I should ask on
this list... how reasonable is this kind of stuff? I realized that there
is a need to be careful with manipulating HTML trees, because different
browsers may construct slightly different trees from the same document.
But this seems like something that is -very- useful; a script running on a
server has so little information about the browser it is serving a
document to, that it can only do a very limited amount of manipulation to
tailor it to the browser. And in many ways, it just seems much cleaning to
do a lot of these things on the browser side.

> If all the program does is wave 'hi Mom!' then who cares, it's dispensible
> as graphics often are. But there is no standard tag to tell the browser
> that the information in the graphics (or other embedded information) is or
> is not duplicated by the text-based tags... this might be useful, to know
> if the availability of graphics/signature/program processing is or is not
> critical to the understanding of the page. I see no way for programs to
> guess. Some authors will abuse it and 'insist' that no one see their page
> unless they are running the required MPEG viewer so they can wave 'hi Mom'
> but that is their problem...

Yep, I agree that this kind of thing should at all be embedded into an HTML
document. As it stands, the only thing that I am allowing to be embedded
is that main setup program, because I see it as in charge of global control
of how a document is to be used by the browser.

Thanks much for all your comments!

-----------------------------------------------------------------------------
Dianne Kyra Hackborn "All governments perpetuate themselves through
hackbod@mail.cs.orst.edu the daily commission of acts which a rational
Oregon State University person might find to be stupid or dangerous (or
//www.cs.orst.edu/~hackbod/ both)." -- Frank Zappa