Re: hyphens

Murray Maloney (murray@sco.COM)
Thu, 1 Dec 94 09:25:49 EST

>
> On the same point, Tim Pierce and I have just discussed the meaning of
> the proposed ­...is it
>
> a. a real hyphen which is an allowed breakpoint, as in much­needed
> and as opposed to a non-breaking one like X-Windows (where you
> don't want an X- at the end of a line;
>
> b. a virtual hyphen like TeX's \- which marks a valid hyphenation
> point like hyphen­ation but which disappears if the word does
> not need breaking.

>From 8859-1: 6.3.3 SOFT HYPHEN (SHY)
A graphic character that is imaged by a graphic symbol
identical with, or similar to, that representing HYPHEN,
for use when a line break has been established within a word.

Note: "for use when a line break has been established within a word"

So, the way that I read it is "b."
That also coincides with my experience in typesetting.
TeX and troff both have a \- to indicate a potential
hyphenation point.

>
> It occurs to me that if we provide an "allowed breakpoint" mark, we
> ought also provide a "forbidden breakpoint". ISONUM has both "hyphen"
> and "shy" but it's not clear if "hyphen" is just a "-" or if it has
> some connotation of permission or forbidding. ISOPUB has "dash" but
> calls it a "true graphic" which doesn't tell us much (especially since
> ndash and mdash are defined separately).

ISONUM and ISOPUB are entity sets. ISO 8859-1 (aka Latin-1) is
an "8-bit single-byte coded graphic character set"

A hyphen is a hyphen is a hyphen. A soft-hyphen is a marker that
tells an H&J engine that there is a valid hyphenation locations
at this position. While there are cases where the presence of
a hyphen in a word (contiguous sequence of characters) is not
an indicator that the word may be broken accross lines, there
is no character in any of the 8859 sets nor in any of the
non-normative SGML entity sets that may be used to declare
that a hyphen should be imaged, but a breakpoint is not allowed.

That there ought to be such a character or entity is arguable.
But the fact is that there is not.
>
> While browsers display ragged-right setting, there's no need for
> hyphenation unless people have really really long words (quite
> possible in science work), but if anyone is going to implement
> justified setting, they're going to have to use something like Liang's
> algorithms (as in PATGEN)...or are we going to see abortions like
> variable inter-letter spacing?

While I would tend to agree that full H&J is not required in
WWW browsers, I am willing to bet money that there are plenty
of people who will not.

>
> ///Pe­ter
>
>