Semantic Tagging in Web Objects (was: Comments on HTML+ Request For Comments)

waterbug@epims1.gsfc.nasa.gov (Steve Waterbury)
Date: Tue, 1 Feb 1994 06:02:04 +0500
From: waterbug@epims1.gsfc.nasa.gov (Steve Waterbury)
Message-id: <9402011102.AA13639@epims1>
To: www-talk@www0.cern.ch
Subject: Semantic Tagging in Web Objects (was: Comments on HTML+ Request For Comments)
Cc: waterbug@epims1.gsfc.nasa.gov, mclay@eeel.nist.gov, shab@trek.eeel.gov,
        b_yencha%spc.dnet@gpo.nsc.com, yuri@sq.com, grout@pdesds1.scra.org,
        wiedmer@iwf.mabp.ethz.ch, bugow@imw.tu-clausthal.de
X-Sun-Charset: US-ASCII
Content-Length: 3638


Jonathan Abbey wrote:
 
> Definite agreement on the semantic markings.. this is one of the single
> most important things that we should be attending to now.. devising ways
> to support things like the Interpedia project from within the WWW framework.
> 
> I would actually hope to see a richer set of semantic tags.. 
> DOCUMENT_TYPE is essential (and it's good to see it here), but I tend
> to think that KEYWORDS is inadequate.  What about some kind of
> hierarchical categorization coding, like dewey decimal or library of
> congress numbers?

I don't think it's a good idea to burden HTML+ with semantic tagging. 
Categorizations, and their cousins, attributes, in general need 
separate support.  I have been in this discussion before (on the other 
side, in fact!) and, IMHO, orthogonality is needed here.  

I think it would be cleaner and more flexible, and would preserve the 
focus of HTML+, to do semantic tagging with SGML tags from outside the 
HTML+ tag set.  These semantic tags would be invisible to an HTML+ 
browser, but would be known to a set of specialized indexing engines, 
SGML editor/parsers, knowbots, etc., whose purpose in life would be to 
record:

1.  the locations {URLs/URNs} of all "objects" that contain certain tags 
2.  the tag contents for those tags in those objects

and to maintain indexes of them on semantic data servers specialized 
to the various semantic domains (granularity TBD).   

This would enable direct querying to find the set of objects on the 
net with a specified tag and with the contents of that tag containing 
a certain string or a value within a certain range of values, etc.  
Of course the objects retrieved can be arbitrary:  documents, binaries, 
images, product catalogs or specific "data sheets", organizational 
directories, technical standards, specifications, whatever.  The 
specialized semantic data servers would be the grandchildren of whois++, 
x.500, WAIS, and SQL servers.  

The BIG project, and there is lots of work being done on this as we 
speak, is to achieve consensus on technically sound information models 
for the various semantic domains.  Of course, the mapping of the 
information models of various sorts into DTD's is non-trivial, but I 
believe it is technically feasible, and I would rate it much easier 
than the original modeling task itself.  

As for categorizations, I think it is a fallacy to believe that 
universal consensus on them is either possible or necessary.  
Categorization schemes become important only for access to otherwise 
uncharacterized objects, but are not nearly as important when query 
access directly to the objects' attributes is available.  

Categories will always be with us, but to be created properly, they 
must derive from a consensual set of attributes (semantic tags) -- 
i.e., they should sit on top of the information models, and will 
probably come in several different flavors for each semantic domain.  
Even within a domain, different groups like to slice things a little 
differently ("around here we call that an _extra_ large!!") 
... but that's no problem -- as long as the basic information models 
and their attribute sets are agreed to, the categorization-du-jour 
can be selected by the end-user.

Steve Waterbury

=====================================================================
Stephen C. Waterbury		Phone:	301-286-7557
NASA Parts Project Office	FAX:	301-286-1695
Code 310.A			email:	waterbug@epims1.gsfc.nasa.gov
NASA/GSFC			"Sometimes you're the windshield;
Greenbelt, MD 20771			sometimes you're the bug."
=====================================================================