Re Re HTML and Lists

Dave_Raggett <dsr@hplb.hpl.hp.com>
From: Dave_Raggett <dsr@hplb.hpl.hp.com>
Message-id: <9301201250.AA14775@manuel.hpl.hp.com>
Subject: Re Re HTML and Lists
To: www-talk@nxoc01.cern.ch
Date: Wed, 20 Jan 93 12:50:25 GMT
Mailer: Elm [revision: 66.25]
Thanks Tim for clarifying the current ideas for DL etc.

>> I would further like to have the freedom to embed lists, e.g. OL in UL and
>> vice versa, but allowing only one level of embedding, e.g.

> I think we should stick to a non-nested structure for HTML and get it  
> registered as a simple format.  Nested structured into HTML2 which we  
> can all discuss when HTML is registered with IANA. Reasonable?

Sounds fine to me. By the way, can you explain the status of the HTML
registration with IANA, e.g. does it include Dan's new emphasis tags or not?
Where will the queryform tags go - into HRML or HTML2?

By the way I am trying to cross-compile my X11/Xlib browser for Windows 3.1 on
the PC. I have found a package which gives you TCP/IP for Windows, but the
catch is they (Distinct of Saratoga CA) want royalties for run-time licenses!

Do you know of any alternative s/w for TCP/IP for Windows which has a free
run-time license, so that we could distribute the browser as freeware?

------

>> My implementation of lists is fairly relaxed whilst being able to support
>> smooth scrolling of arbitrary length HTML documents. The processing demands
>> for this require the browser to be able parse backwards -

> This could be regarded as a weird way of doing things though it  
> doesn't use any memory at all I see. Most people would I think just  
> store the lot.

Some confusion seems in evidence here. My browser currently just stores the
plain text of the HTML document with no extra info. I have the following
program structure:

    DisplayDoc()
        ParseHTML()
            GetToken()
                { GetLiteral() or GetWord() }

    DisplayDoc()
       Refreshes screen for current document type and contents

    ParseHTML()
      This procedure performs one of 4 functions, depending on mode parameter

        EITHER work forwards until current line intersects the top of the
        window and update sgml state variables in the process. (mode=FORWARDS)

        OR work forwards until the current line includes the desired anchor
        and update sgml state variables in the process. (mode=SEARCH)

        OR work forards until a specified part of the buffer,
        but NOT updating the sgml vars  (mode=HEIGHT)

        OR display the formatted text in the window, but leave state vars
        unchanged (mode=DISPLAY)

      The sgml parser is the same for all these functions and shares the
      same code to reduce problems in maintaining muliples copies of the
      same basic algorithm.

      It would be desirable to have a general sgml parser, supplemented by
      the process rules for word wrap, and line breaks etc. For now, I have
      hard-coded the HTML subset and procedural interpretation.

    DeltaHTMLPosition(long h) -- used by scrolling mechanism
       Find the text line which intersects/starts from the top of the window
       Note that h is the new pixel offset for the top of the window itself,
       for which the current value is PixelOffset+nClipped.

       This procedure is really tricky when the new position is above the
       current one. The simple approach would always work forwards from
       the start of the document, but this would be real slow for scrolling
       around near the end of long documents. The approach taken here
       attempts to work locally in the document working backwards looking for
       cues to discover line break decisions and state changes, then parsing
       forwards in the normal manner to get to the current point, giving you
       the relative offset. You now change the current poition to the earlier
       point, and repeat until you have climbed back up to the desired point.
       Works a treat!

    GetToken()
       Reads an html token including newlines and words

    GetLiteral()
       Used in verbatim modes (e.g. PRE), treats text lines as single words

    GetWord()
       Used to non-verbatim mode to recognise html tags, newlines and words
       dealing as appropriate with character references

Any comments?