RE: keyword tag?

Steven Fought (keeper@cs.wisc.edu)
Tue, 2 May 95 07:05:05 EDT

On May 2, 6:40am, Fisher Mark wrote:
} Subject: RE: keyword tag?

} Metadata could be very useful, especially for data not easily discerned from
} a reading of the document text. However, it was shown in 1971 (!) that when
} space is available for a full-text index, full-text searches with
} appropriate pre-processing and a semi-automatically generated thesaurus
} outperform the best manually-indexed systems (cf.
} <URL:http://cs-tr.cs.cornell.edu:80/TR/CORNELLCS:TR71-115>, "A New
} Comparison Between Conventional Indexing (MEDLARS) and Automatic Text
} Processing (SMART)", Gerard Salton). I recommend that metadata be reserved
} for data that can't otherwise be found in the document itself.

This is getting off topic, but I fail to see how metadata could be used
for data that _is_ in the document itself, unless an author just copies
text from the document into a META section, which seems silly.

Personally, I would like to be able to "type" parts of my documents. A
trivial example would be:

<TYPE name="location/city">Hope</TYPE>

If a list of semi-standarized types (similar to MIME types) were agreed on,
search engines could then allow searches like "Where has Dr. Cromwell
mentioned cities?" Many words (like "Philadelphia") would have an
obvious primary type that could be listed in a table available to the
indexing software, but a method of disambiguating examples like the above
is necessary (unless AI context-based discovery systems are much better now
than those I've seen).

Unfortunately, I don't know enough about the area to speak with confidence,
but something like the above mechanism should be available to authors or
organizations that want to allow context-rich searches on documents.

It also seems desirable from a structural markup perspective to be able to
mark up by type with a style sheet. Rather than saying "this is emphasized
text" and elsewhere saying "I want to embolden emphasized text" you could
have real content markup by saying "this is a city" and elsewhere saying
"cities are italicized".

Steven

-- 
Steven Fought
UW Madison Computer Science Webmaster
Computer Systems Lab