Uniform Resource Modifier: a meta-information encoding scheme

ccoprmm@oit.gatech.edu (Michael Mealling)

Mail folder: WWW Talk Jul-Oct 1993
Next message: Tony Sanders: "Plexus 2.2 is official (security fix included, check it out)"
Previous message: Rob Raisch: "Registrar - a URN registry service"

From: ccoprmm@oit.gatech.edu (Michael Mealling)
Message-id: <199307072336.AA08077@oit.gatech.edu>
Subject: Uniform Resource Modifier: a meta-information encoding scheme
To: www-talk@nxoc01.cern.ch
Date: Wed, 7 Jul 93 19:36:20 EDT
X-Mailer: ELM [version 2.3 PL11]
Status: RO

Michael Mealling
Georgia Tech
July, 1993

Uniform Resource Modifier

In this paper, the author proposes a method for encoding information concerning
the content and/or format of a network resource in the context of the URI
method of naming and locating resources.

1: Introduction

A Uniform Resource Modifier (URM) is a way of encoding what is called a
resource's metainformation [1-Weider 93]. This includes information such as
the authors name, the resources data format, and its expiration date.

2: Motivation

This paper was needed to separate the function of content/format specification
of a resource from it's location and naming functions. The naming function is
taken care of by the Uniform Resource Name (URN) [3-Weider 93]. It's purpose
is to uniquely name a resource. The location function is handled by the
Uniform Resource Locator (URL) [Berners-Lee 1993]. It's purpose is to actually
gain access to the resource. Neither of these items gives the user any clues
as to the size, format, content, etc of the resource. These are vital pieces
of information that are contained outside of the resource itself.

The Uniform Resource Modifier (URM) is a method for encoding this
meta-information in a way that will work together with the URL/URN encoding
schemes. It is designed to be extensible and flexible since many methods
could be developed for representing meta-information.

3: The Uniform Resource Modifier (URM)

3.1 Functionality

The URM is designed to provide for a non-persistent meta-information encoding
scheme. It is meant to be used in conjunction with items called
transponders [1-Weider 1993] and other network resources that need typing
information. URMs are meant to be used specifically in conjunction with URLs
as a locally cached entity that is used to give typing information to network
clients. URMs are NOT persistent and may change. They are meant to be human
readable as well as machine readable. This means that certain fields can have
specific internal syntax but that this internal syntax is not to be defined
here and is not required. This allows for machine readable data to co-exist
beside human readable data.

3.2 URM Sections Explanation

A URM (like URLs and URNs) have distinct sections to them: the wrapper, the
encoding format scheme, and the list of encoded items. The syntax is:

URM:Format_Scheme::"Data_Item"::"Data_Item"::...::"Data_Item":::

3.2.1 The wrapper

The wrapper consists of the 4 character header "URM:" and the 3 trailing colons.
These give separation from other Resource Identifiers and follows the convention
of URLs and URNs. This also allows all three identifiers to be encoded into an
easily manipulated template. This template is a subject for further
investigation.

3.2.2 The Format_Scheme

The Format_Scheme is made of of three fields: the Format, language, and character set specifiers. Their format is:

Format:language.character_set

Format is a single identifier that is made up of allowed meta-information
encoding schemes. Recognizing that other encoding schemes exist but that too
many encoding schemes renders a URM useless it is suggested that a very
limited number of encoding schemes be allowed and that those allowed be
registered with the IANA. This is for discussion among the IIIR working
groups [2-Weider] . This paper puts forth one as a good solution to most
encoding problems.

The IAFA working group of the IETF has developed a very large list of field
names and allowed data elements that are used to describe the various
attributes of an item on an FTP site. This list is comprehensive enough to be
used as a URM encoding scheme. This paper suggests that the identifier
string 'IAFA' be used as a Format_Scheme.

Language specifies which language the resulting encoded information is in.
This is specified in the standard format using ISO 639 country and ISO 3316
language codes. The format is:

languagecode_countrycode

For example, British English would be represented as:

en_UK

Character_set should be the ISO name for each allowed character set.

An example would be:

IAFA:en_US.iso88591

3.2.3 The list of encoded items

The list consists of one or more data items surrounded by quotation marks and
separated by double colons. This is the section where the actual data is
encoded. White space of any type is allowed here. If quotation marks are
needed within these items then they should be quoted with a '\' in the C style
of special character quoting. It should be noted that some transport
protocols put restrictions on white space and non-printable characters. These
should be taken into account when transporting URMs around the net.

An example follows:

URM:IAFA:en_US.iso88591::" Author: John Doe "::"
Title: \"My Book\"
"::"
Format: PostScript
":::

(Note the Carriage Return at the beginning and end of some fields. This is
simply an illustration of the inclusion of non-printable characters.)

4. Syntax Specifics

Below is a BNF like syntax for a URM. Where spaces are allowed they are listed
in addition to other characters. Square brackets '[' and ']' are used to
indicate optional parts. Single letters and digits stand for themselves. All
words of more than one letter are either expanded further in the syntax or
represent themselves.

urm URM:Format_Scheme::Item[::Items]:::
Format_Scheme Format:Language:Character_Set
Format alphas
Language isoLanguageCode_isoCountryCode
isoLanguageCode alphas
isoCountryCode alphas
Character_Set alphas
Items Item [Items]
Item "xalphas"
alphas alpha[alphas]
xalphas xalpha[xalpha]
xalpha alpha[:]
alpha any character defined in any iso recognized character set
except for a ':'

5: References

[1-Weider 93]
Weider, Chris. Resource Transponders, March 1993. Available as
ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-transponders-00.txt
[2-Weider 93]
Weider, Chris and Deutsch, Peter. A Vision of an Integrated Internet
Information Service, March, 1993. Available as
ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-iiir-vision-00.txt
[3-Weider 93]
Weider, Chris. Uniform Resource Names, May, 1993. Available as
ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-resource-names-00.txt
[Berners-Lee 1993]
Berners-Lee, Tim. Uniform Resource Locators, March, 93. Available as
ftp://nic.merit.edu/documents/internet-drafts/draft-ietf-uri-url-00.txt
--
------------------------------------------------------------------------------
Michael Mealling ! Hypermedia WWW, WAIS, and gopher will be
Georgia Institute of Technology ! here soon via MIME. Your view of the
Michael.Mealling@oit.gatech.edu ! internet is about to change completely!