Yes indeed, and the second part of us.en is locale.
This is a reason independent of charset issues for why we need
a LANG att.
I'll leave it for someone else to comment on Glenn's CJK heuristics
for *sentences*, but clearly one could not apply them to a single
character <em>X</em> where X is one of the "unified" characters.
Thus heuristics aside we need LANG to deal comprehensively with
10646.
Regards,
-- Terry Allen (terry@ora.com) O'Reilly & Associates, Inc. Editor, Digital Media Group 101 Morris St. Sebastopol, Calif., 95472A Davenport Group sponsor. For information on the Davenport Group see ftp://ftp.ora.com/pub/davenport/README.html or http://www.ora.com/davenport/README.html