This is good to hear.
I think the encoding and labelling has already reached consensus: we
simply use the
text/html; charset=xxxxx
content type. For Unicode, xxxx could be replaced with:
ISO-10646-UCS-4
ISO-10646-UCS-2
ISO-10646-UTF-1
and UTF-7 is also defined in an RFC somewhere. I should note that the
above come from the list for Internet documentation, not for the
registered names (anyone care to list the registered names?).
As such, the infrastructure is largely in place, but the real
remaining problem is making sure that non-Latin1 encoded HTML is also
legal according to the SGML declaration etc. that HTML uses.
A question: was it really extremely difficult to implement Unicode
support, as many naysayers claim?