Re: Globalizing URIs

Glenn Adams (
Wed, 2 Aug 95 23:53:38 EDT

From: Larry Masinter <>
Date: Wed, 2 Aug 1995 19:37:40 PDT

'xxxx' are characters *NOT* in the native character set of the file
system, but rather in whatever transcription of that character set
is made available by the FTP server.

OK. Say the transcription system used by both the FTP client and
server says:

(1) if an octet in the local encoding of a file name is in positions
0x20 - 0x24 or 0x26 - 0x7E of US-ASCII, then use that octet;
(2) if an octet in the local encoding of a file name is in position
0x25 of US-ASCII (i.e., '%'), then use "%%"
(3) otherwise, use %XX for each octet in the encoded file name (in big
endian order), where XX is the hexidecimal value of each such octet.

Now, say I am a Chinese version of Windows/NT using Unicode and I ask you for:

RETR E-W%0B%00.%00T%00X%00T

That is, in Unicode I have the following encoding of my file name:

4E2D 570B 002E 0054 0058 0054

Say you are a Taiwanese server using the BIG5 character set for you file
names. How do you interpret my request? Do you interpret it as a BIG5
string? If so, then you think I just asked for the file "E-W??.?T?X?T"
(assuming for a moment that you don't throw up on NUL and you interpret
NUL and other C0 escapes as '?'.

Even if both systems are using the same transcription system, we are still
hosed because you have no idea of how to interpret the decoded results since
you don't know what character set encoding applies. If, on the other hand,
you know that Unicode was the source encoding, then, at least you know you
shouldn't interpret it as a BIG5 string. You may respond: sorry, I
don't grok Unicode path names. Or, if you have a Unicode -> BIG5 translation
table on hand, you could correctly translate it to the corresponding BIG5
encoding: %A4%A4%B0%EA.TXT. On the other hand, if you do misinterpret the
original file name as a BIG5 string, then you are going to say:

550 E-W??.?T?X?T: No such file OR directory.

And I'm going to be clueless about what went wrong.

If the protocol doesn't communicate this information, then who/what will?