keyword tag?

Ofer Inbar (cos@leftbank.com)
Mon, 1 May 95 12:14:37 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Gavin Nicol: "Re: ISO/IEC 10646 as Document Character Set"
Previous message: Dave Raggett: "Re: Planning new RFCs! [was: REL and REV attributes -- a new RFC?]"
Next in thread: weibel@oclc.org: "Re: keyword tag?"
Maybe reply: weibel@oclc.org: "Re: keyword tag?"
Maybe reply: Peter Flynn: "Re: keyword tag?"
Maybe reply: Marc Salomon: "Re: keyword tag?"
Maybe reply: Paul Burchard: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Dirk Herr-Hoyman: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Dan Connolly: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Steven Fought: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"

I'd like to make a suggestion for a new HTML feature. I'm not on the
mailing list, but I did attend the first Danvers session and search
the mailing list hyperarchive, and haven't seen/heard about anything
like this yet. Since I'm not on this list, I would appreciate it if
any direct replies to this message are Cc'ed to me. Thanks.

I would like for HTML documents to have some facility to communicate
to an indexing tool (a web worm/spider/etc) what the author believes
is significant. Currently, most webworms simple index the entire text
of the HTML document, throwing out excessively common words ("the",
"web", ...) and 'words' which have numbers or special characters in
them. Some webworms pay special attention to what's inside of <title>
or <h1> tags, as a way of trying to figure out what the author of the
document thinks is significant about the document.

Both of these methods have serious flaws. The first tends to index
both too much and too little. For example, while the word "web" may
be so prevalent in web documents that indexing tools are forced to
drop it, it should be retained for the W3 Consortium site, for
example, or a nature site with a special page about spiders. And
webworms that drop words with numbers and special characters in them
will drop the word "3Com" from their index of 3Com's web site.

At least one web index that I have looked at does specify a way for
authors to build the index. Since there is no diret suppor in HTML,
the tool requires web server maintainers to build their own, separate,
index file, in a specific format, and leave it in an HTTP-accessible
document with a specific filename so the indexing tool can find it.

What we really should have is some sort of markup that goes into the
<head> portion of an HTML document (since it should not be displayed)
that specifies the author's intention of what keywords should be used
to index this document. For example, maybe <kl>..</kl> to mark a key
list, with <ki> to denote each individual key item.

-- Cos (Ofer Inbar) -- cos@leftbank.com cos@cs.brandeis.edu
-- The Left Bank Operation -- lbo@leftbank.com http://www.leftbank.com
"We all misuse the net for personal gain, one way or another."
-- Larry Wall <lwall@netlabs.com>

Next message: Gavin Nicol: "Re: ISO/IEC 10646 as Document Character Set"
Previous message: Dave Raggett: "Re: Planning new RFCs! [was: REL and REV attributes -- a new RFC?]"
Next in thread: weibel@oclc.org: "Re: keyword tag?"
Maybe reply: weibel@oclc.org: "Re: keyword tag?"
Maybe reply: Peter Flynn: "Re: keyword tag?"
Maybe reply: Marc Salomon: "Re: keyword tag?"
Maybe reply: Paul Burchard: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Dirk Herr-Hoyman: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Dan Connolly: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"
Maybe reply: Steven Fought: "Re: keyword tag?"
Maybe reply: Dave Raggett: "Re: keyword tag?"