Re: Globalizing URIs

Terry Allen (
Wed, 2 Aug 95 19:50:54 EDT

| In message <>, Glenn Adams writes:
| >
| >It is my current understanding that arbitrary bytes can be encoded in URLs.
| Well... that's stretching it. Arbibrary bytes can be encoded in
| morse code too. A URL is a sequence of US-ASCII characters. Check
| RFC1738:
| 2.2. URL Character Encoding Issues
| URLs are sequences of characters, i.e., letters, digits, and special
| characters. A URLs may be represented in a variety of ways: e.g., ink
| on paper, or a sequence of octets in a coded character set. The
| interpretation of a URL depends only on the identity of the
| characters used.
| In most URL schemes, the sequences of characters in different parts
| of a URL are used to represent sequences of octets used in Internet
| protocols. For example, in the ftp scheme, the host name, directory
| name and file names are such sequences of octets, represented by
| parts of the URL. Within those parts, an octet may be represented by
| the chararacter which has that octet as its code within the US-ASCII
| [20] coded character set.

Yet Timbl's "Universal Resource Identifiers in WWW," RFC 1630 (not said
in RFC 1738 to have been updated or made obsolete by RFC 1738), says:

The completeness requirement is easily met by allowing
particularly strange or plain binary names to be encoded in base
16 or 64 using the acceptable characters.

which clearly envisions the use of an underlying coded charset other
than ASCII, not to mention "binary names." Completeness is
defined thusly:

Complete It is possible to encode any naming

And I would think that would include filenames, etc., in Chinese.

Is RFC 1630 out of date? am I missing something between 1630
and 1738? is the underlying charset to be defined in the RFCs
standardizing particular URL schemes, and thus not handled in


Terry Allen  (   O'Reilly & Associates, Inc.
Editor, Digital Media Group    101 Morris St.
			       Sebastopol, Calif., 95472

A Davenport Group sponsor. For information on the Davenport Group see or

Current HTML 2.0 spec: