Re Re HTML and Lists
Dave_Raggett <dsr@hplb.hpl.hp.com>
From: Dave_Raggett <dsr@hplb.hpl.hp.com>
Message-id: <9301201250.AA14775@manuel.hpl.hp.com>
Subject: Re Re HTML and Lists
To: www-talk@nxoc01.cern.ch
Date: Wed, 20 Jan 93 12:50:25 GMT
Mailer: Elm [revision: 66.25]
Thanks Tim for clarifying the current ideas for DL etc.
>> I would further like to have the freedom to embed lists, e.g. OL in UL and
>> vice versa, but allowing only one level of embedding, e.g.
> I think we should stick to a non-nested structure for HTML and get it
> registered as a simple format. Nested structured into HTML2 which we
> can all discuss when HTML is registered with IANA. Reasonable?
Sounds fine to me. By the way, can you explain the status of the HTML
registration with IANA, e.g. does it include Dan's new emphasis tags or not?
Where will the queryform tags go - into HRML or HTML2?
By the way I am trying to cross-compile my X11/Xlib browser for Windows 3.1 on
the PC. I have found a package which gives you TCP/IP for Windows, but the
catch is they (Distinct of Saratoga CA) want royalties for run-time licenses!
Do you know of any alternative s/w for TCP/IP for Windows which has a free
run-time license, so that we could distribute the browser as freeware?
------
>> My implementation of lists is fairly relaxed whilst being able to support
>> smooth scrolling of arbitrary length HTML documents. The processing demands
>> for this require the browser to be able parse backwards -
> This could be regarded as a weird way of doing things though it
> doesn't use any memory at all I see. Most people would I think just
> store the lot.
Some confusion seems in evidence here. My browser currently just stores the
plain text of the HTML document with no extra info. I have the following
program structure:
DisplayDoc()
ParseHTML()
GetToken()
{ GetLiteral() or GetWord() }
DisplayDoc()
Refreshes screen for current document type and contents
ParseHTML()
This procedure performs one of 4 functions, depending on mode parameter
EITHER work forwards until current line intersects the top of the
window and update sgml state variables in the process. (mode=FORWARDS)
OR work forwards until the current line includes the desired anchor
and update sgml state variables in the process. (mode=SEARCH)
OR work forards until a specified part of the buffer,
but NOT updating the sgml vars (mode=HEIGHT)
OR display the formatted text in the window, but leave state vars
unchanged (mode=DISPLAY)
The sgml parser is the same for all these functions and shares the
same code to reduce problems in maintaining muliples copies of the
same basic algorithm.
It would be desirable to have a general sgml parser, supplemented by
the process rules for word wrap, and line breaks etc. For now, I have
hard-coded the HTML subset and procedural interpretation.
DeltaHTMLPosition(long h) -- used by scrolling mechanism
Find the text line which intersects/starts from the top of the window
Note that h is the new pixel offset for the top of the window itself,
for which the current value is PixelOffset+nClipped.
This procedure is really tricky when the new position is above the
current one. The simple approach would always work forwards from
the start of the document, but this would be real slow for scrolling
around near the end of long documents. The approach taken here
attempts to work locally in the document working backwards looking for
cues to discover line break decisions and state changes, then parsing
forwards in the normal manner to get to the current point, giving you
the relative offset. You now change the current poition to the earlier
point, and repeat until you have climbed back up to the desired point.
Works a treat!
GetToken()
Reads an html token including newlines and words
GetLiteral()
Used in verbatim modes (e.g. PRE), treats text lines as single words
GetWord()
Used to non-verbatim mode to recognise html tags, newlines and words
dealing as appropriate with character references
Any comments?