Uniform Resource Modifier: a meta-information encoding scheme

ccoprmm@oit.gatech.edu (Michael Mealling)
From: ccoprmm@oit.gatech.edu (Michael Mealling)
Message-id: <199307072336.AA08077@oit.gatech.edu>
Subject: Uniform Resource Modifier: a meta-information encoding scheme
To: www-talk@nxoc01.cern.ch
Date: Wed, 7 Jul 93 19:36:20 EDT
X-Mailer: ELM [version 2.3 PL11]
Status: RO

							Michael Mealling
							Georgia Tech
							July, 1993

Uniform Resource Modifier

In this paper, the author proposes a method for encoding information concerning
the content and/or format of a network resource in the context of the URI 
method of naming and locating resources.

1: Introduction

A Uniform Resource Modifier (URM) is a way of encoding what is called a 
resource's metainformation [1-Weider 93].  This includes information such as 
the authors name, the resources data format, and its expiration date.  

2: Motivation

This paper was needed to separate the function of content/format specification 
of a resource from it's location and naming functions.  The naming function is 
taken care of by the Uniform Resource Name (URN) [3-Weider 93].  It's purpose 
is to uniquely name a resource.  The location function is handled by the 
Uniform Resource Locator (URL) [Berners-Lee 1993].  It's purpose is to actually 
gain access to the resource.  Neither of these items gives the user any clues 
as to the size, format, content, etc of the resource.  These are vital pieces 
of information that are contained outside of the resource itself.

The Uniform Resource Modifier (URM) is a method for encoding this 
meta-information in a way that will work together with the URL/URN encoding 
schemes.  It is designed to be extensible and flexible since many methods 
could be developed for representing meta-information.

3: The Uniform Resource Modifier (URM)

3.1 Functionality

The URM is designed to provide for a non-persistent meta-information encoding 
scheme.  It is meant to be used in conjunction with items called 
transponders [1-Weider 1993] and other network resources that need typing 
information.  URMs are meant to be used specifically in conjunction with URLs 
as a locally cached entity that is used to give typing information to network 
clients.  URMs are NOT persistent and may change.  They are meant to be human 
readable as well as machine readable.  This means that certain fields can have 
specific internal syntax but that this internal syntax is not to be defined 
here and is not required.  This allows for machine readable data to co-exist 
beside human readable data.

3.2 URM Sections Explanation

A URM (like URLs and URNs) have distinct sections to them: the wrapper, the 
encoding format scheme, and the list of encoded items.  The syntax is:

URM:Format_Scheme::"Data_Item"::"Data_Item"::...::"Data_Item":::

3.2.1 The wrapper

The wrapper consists of the 4 character header "URM:" and the 3 trailing colons.
These give separation from other Resource Identifiers and follows the convention
of URLs and URNs.  This also allows all three identifiers to be encoded into an
easily manipulated template.  This template is a subject for further 
investigation.

3.2.2 The Format_Scheme

The Format_Scheme is made of of three fields: the Format, language, and character set specifiers.  Their format is:

Format:language.character_set

Format is a single identifier that is made up of allowed meta-information 
encoding schemes.  Recognizing that other encoding schemes exist but that too 
many encoding schemes renders a URM useless it is suggested that a very 
limited number of encoding schemes be allowed and that those allowed be 
registered with the IANA.  This is for discussion among the IIIR working 
groups [2-Weider] .  This paper puts forth one as a good solution to most 
encoding problems.

The IAFA working group of the IETF has developed a very large list of field 
names and allowed data elements that are used to describe the various 
attributes of an item on an FTP site.  This list is comprehensive enough to be 
used as a URM encoding scheme.  This paper suggests that the identifier 
string 'IAFA' be used as a Format_Scheme.

Language specifies which language the resulting encoded information is in.  
This is specified in the standard format using ISO 639 country and ISO 3316 
language codes.  The format is:

languagecode_countrycode 

For example, British English would be represented as: 

en_UK

Character_set should be the ISO name for each allowed character set.

An example would be:

IAFA:en_US.iso88591

3.2.3 The list of encoded items

The list consists of one or more data items surrounded by quotation marks and 
separated by double colons.  This is the section where the actual data is 
encoded.  White space of any type is allowed here.  If quotation marks are 
needed within these items then they should be quoted with a '\' in the C style 
of special character quoting.  It should be noted that some transport
protocols put restrictions on white space and non-printable characters.  These 
should be taken into account when transporting URMs around the net.

An example follows:

URM:IAFA:en_US.iso88591::" Author: John Doe "::"
Title: \"My Book\"
"::"
Format: PostScript
":::

(Note the Carriage Return at the beginning and end of some fields.  This is 
simply an illustration of the inclusion of non-printable characters.)


4.  Syntax Specifics

Below is a BNF like syntax for a URM.  Where spaces are allowed they are listed 
in addition to other characters.  Square brackets '[' and ']' are used to 
indicate optional parts.  Single letters and digits stand for themselves.  All 
words of more than one letter are either expanded further in the syntax or 
represent themselves.

urm             URM:Format_Scheme::Item[::Items]:::
Format_Scheme   Format:Language:Character_Set
Format          alphas
Language        isoLanguageCode_isoCountryCode
isoLanguageCode alphas
isoCountryCode   alphas
Character_Set   alphas
Items           Item [Items]
Item            "xalphas"
alphas          alpha[alphas]
xalphas         xalpha[xalpha]
xalpha          alpha[:]
alpha           any character defined in any iso recognized character set
                except for a ':'

5: References

[1-Weider 93] 
   Weider, Chris.  Resource Transponders, March 1993.  Available as 
   ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-transponders-00.txt 
[2-Weider 93] 
   Weider, Chris and Deutsch, Peter.  A Vision of an Integrated Internet 
   Information Service, March, 1993.  Available as 
   ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-vision-00.txt 
[3-Weider 93] 
   Weider, Chris.  Uniform Resource Names, May, 1993.  Available as 
   ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-resource-names-00.txt 
[Berners-Lee 1993] 
   Berners-Lee, Tim.  Uniform Resource Locators, March, 93.  Available as 
   ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-url-00.txt 
-- 
------------------------------------------------------------------------------
Michael Mealling                     ! Hypermedia WWW, WAIS, and gopher will be
Georgia Institute of Technology      ! here soon via MIME. Your view of the 
Michael.Mealling@oit.gatech.edu      ! internet is about to change completely!