Dan, we went through all this re Unicode. If it's not in the document
charset it doesn't make it through the parse. Period. This is in
the category

> * in viloation of SGML
(e.g. the way most browser handle single
quotes and >'s ala <img src='foo' alt=">gotcha!<"> )

and processing instructions ...


