Re: Globalizing URIs

Daniel W. Connolly (
Wed, 2 Aug 95 20:47:06 EDT

In message <>, "Terry Allen" writes:
>Yet Timbl's "Universal Resource Identifiers in WWW," RFC 1630 (not said
>in RFC 1738 to have been updated or made obsolete by RFC 1738), says:
> The completeness requirement is easily met by allowing
> particularly strange or plain binary names to be encoded in base
> 16 or 64 using the acceptable characters.
>which clearly envisions the use of an underlying coded charset other
>than ASCII, not to mention "binary names."

I agree that characters beyond the ASCII repertoire can be
encoded in URLs, by the use of conventions like "ok folks:
to represent the character whose unicode code position is XXXX,
write ,XXXX,". But this doesn't change the fact that the
coded character set is ASCII: that representation ",XXXX," is 6
characters (usually represented as 6 octets), not six octets
that make up one character.

In other words, if you know some resource by it's arabic name,
in order refer your friend to that resource, you'd better
represent that arabic name with ASCII characters, unless you
and your friend have some private agreement that goes beyond
the spec.

> Completeness is
>defined thusly:
> Complete It is possible to encode any naming
> scheme.

That doesn't mean very much to me. Define naming scheme :-) It's like
talking about "the set of all sets." I'm sure that there are
interpretations of that definition that end up at Russel's paradox.

>Is RFC 1630 out of date?

RFC1630 is informational. It has no standards status.

RFC1630 _and_ RFC1738 have serious flaws in their discussion
of URL syntax (i.e. their BNFs). RFC1808 has a clear, correct
discussion of the syntax. In fact, it has everything you
need except how to use the parts of URLs in various internet

> is the underlying charset to be defined in the RFCs
>standardizing particular URL schemes, and thus not handled in

No. At least I don't believe so. I believe Larry M. made this
point pretty clearly somewhere in the archives.
True, Larry?