Welcome to the front Martin.
Actually, we have a solution to this problem. It has been recently
proposed that there be a file format in which the MIME headers will be
intact. The MIME headers contain information regarding the character
set and encoding used in the document. The MIME header information is
basically limited to ASCII (though one should read the HTTP
specification for the exact specification).
In addition, it looks very likely that ISO 10646 will be used as the
document character set for HTML.