A short paper on creating Web Space

Paul "S." Wain <Paul.Wain@brunel.ac.uk>
Errors-To: listmaster@www0.cern.ch
Date: Wed, 15 Jun 1994 14:18:18 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <8502.9406151214@cook.brunel.ac.uk>
Errors-To: listmaster@www0.cern.ch
Reply-To: Paul.Wain@brunel.ac.uk
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Paul "S." Wain <Paul.Wain@brunel.ac.uk>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: A short paper on creating Web Space
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: text/plain; charset=US-ASCII
Content-Type: text/plain; charset=US-ASCII
Mime-Version: 1.0
Mime-Version: 1.0
X-Mailer: ELM [version 2.4 PL21]
X-Mailer: ELM [version 2.4 PL21]
Hi,

I enclose the 1st draft (complete with all errors) of a short paper that
I knocked up last night in/this morning, based upon the trials and
tribulations of creating a uniform web space in a multi-department
environment. It covers the possible areas of resistance that can be
encountered, and looks for a path through them.

Obviously it needs more work done on it, but at this point Im looking
for suggestions as to whether people think that the points that are made
are indeed valid. What solutions people can recommend etc. (Also typo
and gramatical corrections appreciated *sigh*)

The URL of the paper is:

	http://http2.brunel.ac.uk:8080/paul/papers/intro_html_3.html

But I have appeneded a text version. The HTML version has links to the
good and bad HTML examples.

Comments please :)

Paul

p.s. watch out for a paper coming soon commemorating the 1st aniversary
of HTTPD served web space at Brunel.

.-------------------------------------------------------------------------.
|_______Paul_S._Wain,_(X.500_Project_Engineer_and_WWW/HTTP_chappie),______|
| Computer Centre, Brunel University, Uxbridge, Middx., UB8 3PH, ENGLAND. |
|___VOICE:_+44_895_274000_extn_2391_______EMAIL: Paul.Wain@brunel.ac.uk __|
|                 http://http2.brunel.ac.uk:8080/paul/                    |
`-------------------------------------------------------------------------'

  ABSTRACT
   
   With the formalization of the HTML standards at the May 1994 WWW
   conference, a number of changes to the way that documents are marked
   up using HTML came into effect. Some of these are minor, some major
   and some just a reiteration of good practices that people are not
   using.
   
   This paper outlines what is wrong with the way HTML is being viewed at
   Brunel at the moment and what could be done to perhaps solve some of
   this, outlining some of the ongoing situations that need to be
   resolved, and possible suggestions for doing so.
   

  INTRODUCING HTML 2.0 AND HTML 3.0 COMPLIANT MARKUP INTO BRUNEL UNIVERSITY.
                                       
1 Introduction.
   
   The computer centre at Brunel University has recently undergone the
   HTML revolution and begun moving some of its user documentation on
   line. In addition to this a large number of users have created their
   own home pages. As a result there are a large number of HTML pages
   within the brunel.ac.uk domain.
   
   The Web structure at Brunel is probably fairly unique. We currently
   have 4 httpd servers all serving different tasks. The structure of
   these looks something like:

                              The User
                                  |
                                http3 <==> 150Mb Cache
                                  |
                     '-------+----+---+-------`
                     |       |        |       |
                   http1   http1    http2   world
   port:            8080    4040     8080

   
   Basically the local servers break down as follows: 
   
   http1.brunel.ac.uk:8080
          The main Brunel service. University home page. Guide to Brunel.
          
   http1.brunel.ac.uk:4040
          The StudentSoft service. Holds information on the Studentsoft
          project.
          
   http2.brunel.ac.uk:8080
          Solaris manual pages and user home pages. Also holds some
          newsgroup home pages.
          
   In addition to this (as the above textual representation indicates) we
   also have a cache service running on http3 with a small cache of
   approximately 150Mb. This is sufficient at the moment with most users
   currently being away on industrial placements or on holiday.
   
   Currently we recommend two Web browsers at Brunel, NCSA Mosaic for
   X/Openwindows, and Lynx for text based systems. Both are only
   available on SunOS 4.1.3, (Although a Solaris 2.3 version of Mosaic is
   under test) and only Mosiac is actively supported by the User Support
   team.
   
   From this then it can be seen that Brunel is probably of average size
   for a World Wide Web site, so many problems introducing new tools and
   requirements for HTML found at Brunel may apply elsewhere.
   
   So what are the problems associated with the introduction of the new
   standards?
   

2 But It Works!
   
   A common misconception among users writing HTML is the "But it works
   with the current version of Mosaic" attitude. This leads to some
   interesting bad practices, which, when combined with the new
   definition of items such as paragraphs and lists, may cause absolute
   chaos.
   
   For example, the following sort of markup is quite common:

   <HTML>
   <TITLE>A document</title>
   <H1>A test document</H1>
   <P>
   This is a paragraph with a list:
   <MENU>
   <IMG SRC="dot.gif" ALT="*">Item one<BR>
   <IMG SRC="dot.gif" ALT="*">Item two<BR>
   </MENU>
   </P>
   
   The basic structure of this document isn't two hard to derive.
   Basically, what it should read is:

   <HTML>
        <HEAD>
                <TITLE>A document</TITLE>
        </HEAD>
        <BODY>
                <H1>A test document</H1>
                <P>
                        This is a paragraph with a list
                </P>
                <DL>
                        <DT><IMG SRC="dot.gif" ALT="*">Item One</DT>
                        <DT><IMG SRC="dot.gif" ALT="*">Item two</DT>
                </DL>
        </BODY>
   </HTML>

   There is a big difference between the two. One is correct and one
   isn't (ignoring the fact that the <HEAD> and <BODY> tags can be
   implied in the first example). Both will render in Mosaic 2.4. But
   only one passes a DTD compliance test.
   
   With the moves to make documents compliant in the future this will
   cause great problems. So much so that I am starting to tell people
   that if they write their documents in the way of the first example
   then I can only see their documents working correctly after about the
   next 3 months.
   
   The problem is however that because of the fact that bad practices
   currently work with existing browsers, people are unwilling to make
   the change. But things get worse.
   

3 Using Editors - Correct vs. Easiest to use

   This can probably be described as the root of the problem. With the
   current HTML DTDs only just being laid down in stone, many HTML
   editors are still a step or two behind the time. This creates a
   situation whereby documents are being produced by editors that claim
   to be compliant with current standards (again this means Mosaic!) but
   which are not correct.
   
   For example these pages were produced using HoTMetaL from Soft Quad.
   As far as I can tell it uses a very rigid interpretation of the HTML
   3.0 DTD, in that it will not let certain tags be nested although
   looking at the DTD they can be. However looking at other editors, and
   taking the output they produce (supposedly compliant) it will not read
   into HoTMetaL. The problem with explaining this to users is that
   other editors are easier to use!
   
   So which is the right path to take? Obviously the compliant path since
   this guarantees that a document will display in the future. But if the
   user tools are not in place to help people do this, we are stuck at
   base one. And the problem gets worse when we consider that many people
   out there are editor illiterate and so need things as simple as they
   can get.
   
   (Aside: Today for the 1st time I am using HoTMetaL with "show tags"
   off. Its taken me a week to learn how to use it correctly, and
   understand its warnings. So what chance the normal user? On the other
   hand, there are editors out there which our User Support people tell
   me can be learnt in a few minutes but which produce dubious output.
   What would a typical user chose?)
   

4 House/Corporate Styles.
   
   Another situation we are currently trying to resolve at Brunel at the
   moment is that of the introduction of a default style for markup in
   departmental pages. That is, trying to define a common layout of
   information for entry pages for departmental information.
   
   This creates a situation whereby we need to be able to enforce both:
     * Layout and content.
     * HTML style (i.e. version 2.0 or 3.0 compliant)
       
   While this can be considered a side issue of the two previous cases it
   does draw the two together nicely. It is envisaged that templates will
   be provided for users to use to create their own pages. But the
   problem still exists as to ensuring that the document is still DTD
   compliant and what to do if it isn't!
   
   There is still a market out there for correction tools.
   
   (Another note: Again HoTMetaL will tell you what is wrong, but it wont
   correct it. I don't know about other editors since I never go that
   far!)
   
5 Conclusion
   
   In writing this paper I deliberately chose not to offer solutions to
   the problems being encountered, and not to discus options that are
   being considered until this point. However we do have some ideas under
   review and these are basically as follows:

    1. Decide upon your page style but remember what correct HTML can and
       can't do.

    2. Produce templates for your users to use if they want. These should
       include as many examples as you can provide. Describle them within
       the document if you can!

    3. Consider the imposition of a default HTML authoring tool. If
       people want to use such a tool ensure that everyone is using the
       same one. Remember that it should be possible to use this tool on
       more than one platform (e.g. MS Windows, UNIX, MAC). Remeber that
       not everyone can run eXceed for MS Windows.

    4. Enforce your decisions. If someone produces HTML that is broken,
       suggest that they should be using the default editor. If you have
       the default editor set up correctly, they wont be able to write
       bad HTML. However you will also need to include provision for
       converting bad HTML to good HTML. Always tell people to read HTML
       primers before starting.

    5. Where possible don't allow bad HTML to be served by your HTTP
       daemons. This of may not always be possible (as in the case of
       Brunel).

    6. Pray to your relevant diety.
       
   
6 Footnote
   
   Finally I would just like to add a small plea to the world in general.
   I feel that the following are really needed at this point in time:

    1. True HTML 2.0 and HTML 3.0 browsers. (I know these are in the
       pipeline) I would especially like to see a version of Mosaic that
       complains if the HTML is wrong since this will prevent the biggest
       resistance to the future of the Web.

    2. More HTML editors supporting WYSIWYG. I know these are starting to
       appear but they need to start producing better HTML from the DTD
       point of view. (Again, I know this is a new area, but it needs to
       stated.)

    3. HTML fixers. Tools to take bad HTML and make it good. After all if
       Mosaic can display it it should be able to write it back out as it
       should be.