Re: Re Dan on implementation

lenst@lysator.liu.se
Errors-To: listmaster@www0.cern.ch
Date: Fri, 25 Feb 1994 11:32:43 --100
Message-id: <199402200023.BAA05251@lysita>
Errors-To: listmaster@www0.cern.ch
Reply-To: lenst@lysator.liu.se
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: lenst@lysator.liu.se
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Re Dan on implementation 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2152

In message <9402170147.AA06698@ulua.hal.com>, "Daniel W. Connolly"
<connolly@hal.com> uses C as an example of a context free language and
then goes on to write:

>If we keep HTML down to a context-free language composed of regular
>tokens, then folks can write little 20-line ditties in perl, elisp,
>lex, yacc, etc. and get real work done.

Can you write a little 20-line perl program that lists the variables
of a C program?

>If we require real-time processing of all legal SGML documents,
>we buy nothing in terms of functionality, and we render almost
>all current implementations broken.

I don't think it has been suggested that browsers need to be able to 
process *all* legal SGML documents.  It is after all a specific DTD
and a specific SGML declaration.

>>| 	<!-- this: <A HREF="abc"> looks like a link too! -->
>>
>>How so?  It's in a comment, and so will be ignored by a parser.
>
>Yes, by an SMGL compliant parser, but not by any parser built
>out of standard parsing tools like regular expressions, lex, and yacc.
>(well, actually, you could do it with lex, but it's a pain...)

Recognising a comment can be done with regular expressions.  If you
have trouble making lex and yacc handle this, I don't think it is
because the limitations of lex and yacc.

>>| 	And this: a < b > c has no markup at all, even though it
>>| 	uses the "magic" < and > chars.
>>
>>But not in the magic combinations <[A-Za-z] etc.
>
>Right. The famous "delimiter in context". Contrast this with the
>vast majority of "context free" languages in use.

I will compare this with C. In C "/" is a token used for the division
operator and "*" is a token used for the multiplication operator, but
when "/" is followed by "*" it is a comment start.  This is consistent
with a "context free" language as is recognising a "<" as a start tag
opener when it is followed by a letter.


>You say "crippled", I say "expedient". Remember: the documents are
>still conforming. It's just the WWW client parser that's non-standard.

It is harder to make SGML tools produce correct HTML if HTML has a lot
of arbitrary restrictions.

--
Lennart Staflin  <lenst@lysator.liu.se>