Image types and related issues [was: Re: filetype extensions]

lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)

Mail folder: WWW Talk Apr 94-present
Next message: burchard@geom.umn.edu: "Re: beyond IMAGEMAP "
Previous message: Craig Milo Rogers: "Re: S: RFC-Index "
Reply: Larry Masinter: "Re: Image types and related issues [was: Re: filetype extensions]"

Errors-To: listmaster@www0.cern.ch
Date: Tue, 10 May 1994 20:32:37 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <94051019265103@cguv5.cgu.mcc.ac.uk>
Errors-To: listmaster@www0.cern.ch
Reply-To: lilley@v5.cgu.mcc.ac.uk
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: lilley@v5.cgu.mcc.ac.uk (Chris Lilley, Computer Graphics Unit)
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Image types and related issues [was: Re: filetype extensions]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas

"Daniel W. Connolly" <connolly@hal.com> said:

>	"Assume, for the sake of argument, that this caching server implements
>	100% intvertible translation between gif and tiff."

I am willing to pretend that you said

	"Assume, for the sake of argument, that this caching server implements
	100% intvertible translation between format A and format B."

The particular example you cited (tiff to gif) had enough problems; the reverse 
process is guaranteed not to produce the same image. The information loss going 
from a 24 bit TIFF to an 8 bit (at best) GIF is no way reversible.

> It appears that clients should be able to express
> a tolerance (and lack thereof) for information loss in conversion.

Yes, and this then harks back to the earlier discussion about format conversion 
and negotiation on the original server.

> Something like:

>	Accept: image/gif; t=1.0

> The Accept: header already specifies things like how much it costs the client
> to deal with the given format, and tolerance on how long it is willing to wait
> for a conversion. 

It does? You mean, in theory, or do actual clients and servers generate and use 
this information?

> We just add one that says "my tolerance for information loss
> is 1.0, i.e. no information loss is tolerable." For help icons and such, you
> would set t=0.9 or so.

OK, but you need to specify what exactly the different levels of quality mean. 
1.0 is clear enough ;-) and so is 0.0 - convert it any old way but give me some 
sort of image.

The meaning of the intermediate values needs to be defined. How does 0.7 diffier 
from 0.4, exactly?

A further point; I assumed 1.0 to mean the exact same file as on the server. 
What if, however, you ask the original server for

Accept: image/x-iris-rgb; t=1.0

(Iris RGB is a lossless 24 bit image format BTW) and the server has TIFF 
available? It can do a conversion, and it can guarantee (in most cases) that the 
RGB value of each pixel is identical. But it is not the same file.

And another thing - suppose a server is configured to convert the TIFF to an 
Iris RGB, maybe cache it for a day, then delete it to prevent wasting disk space 
as multiple formats of the same image build up. A month later, I use the same 
URL and get the same conversion done. Fine. Now put a proxy in the way; it 
happens to be cacheing last months Iris RGB file. If I ask it for that file, and 
it asks the original server for the last modified and expires fields for 
foo.rgb.

What should the server respond? The values for the original TIFF?  A 
last-modified of NOW as it is building the file transparently, on the fly, as we 
speak?

Or consider the case where the server does not throw away the conversions, but 
keeps them around (disk space is cheap at this theoretical site). Now I alter 
the original TIFF. Whose responsibility is it to expire the Iris RGB, JPEG, Utah 
RLE, PCX, etc etc formats that (perhaps unbeknownst to me) the server has 
created. And later I make an even newer version of the image, but choose to save 
it as a UTAH RLE. What happens to the previous TIFF (just in case you were 
thinking of designating one formnat as a master format on which the others 
depend, like some sort of revision control system or makefile)

[ Note for those who care; Iris RGB and Utah RLE (assumed linear encoding) and 
TIFF (assumed generic RGB coding) are all 24 bit lossless formats. Within the 
assumptions specified, these 3 can be freely and repeatedly interconverted 
without information loss, which is why I picked them as examples.]

To sum up, I am saying that there is a complex interaction between a) 
transparent format negotiation and conversion, and b) cache coherency issues 
arising from proxy cacheing. These raise a whole host of issues that urgently 
need an interim solution and long term need to be elegantly sorted out and 
documented. 

The issues do not seem to have been raised before till I started messing with 
them, but then I don't work in a computer graphics unit for nothing ;-)

I think that part of the problem comes from the general culture of early 
internet users. If images are just little gifs of hands, arrows etc that mean 
'back', or they are JPEGs of trains, landscapes and naked ladies to stick in 
your root window, the image quality considerations go out the window. But the 
internet in general and WWW in particular will not remain the preserve of the 
'casual browser at a university' for long.

Once you start getting important or even 'mission critical' images floating 
around, these issues need to be solved. That time has not yet arrived - but that 
time might be next year; lets sort it out while there is time.

By important I mean publishers shipping 48 MByte TIFFs which will eventually be 
brought into a page layout system and appear in a glossy magazine. They don't 
want the image content of these to be converted or altered. They may however be 
happy if the internal (lossless) compression goes from packbits to lzw, as there 
is no information loss. They certainly don't want it converted on the fly from a 
40 K GIF that happens to have the same name.

By mission critical I mean things like medical images; if I have just been put 
in a cat scanner and a consultant somewhere on the other side of the world is 
teleconferencing with the surgeon who is about to perform brain surgery on me, I 
want that consultant to see *exactly* the original image !!

<Side_issue>
This cultural heritage also shows in the mime types. image/tiff conveys nothing, 
really. Look at the TIFF 6.0 spec. How do I specify that I can handle packbits 
and lzw encoding, but not JPEG encoding; that I can handle palette and full 
colour generic RGB, CMYK, greyscale, and bilevel images but not YCbCr, or CIELAB 
and I would like any tiled images to be converted to strips? These are all 
"tiff"

Of course, TIFF is the most complex format being discussed here.
</Side_issue>

> To reiterate: we need to be able to put this info _in_the_link_markup_, since
> it is not only a function of the client's capabilities, but also of the 
> author's intent. 

I agree absolutely. The tolerance info will vary from image to image so it 
cannot be set once-for-all when installing the browser, for example. The 
information must be saved in the document as it cannot in general be inferred.

> For example, when I create a link to a help icon, I don't care if a few
> color bits here and there get changed. But if I'm linking to a medical image,
> I certainly do care!

This agrees with my position.

> We just need to be careful! Keep
> all the issues on the table and allow references to express _exactly_
> what they refer to, and how much "slop" they'll tolerate.

Ok, fine. Once the definitions have been firmed up, tested out and standardised, 
this seems to me the way to go.

<Side_issue>
One side effect of this is that document creation becomes, again, more complex. 
A consequence of greater penetration of the web is that the range of skills in 
both providers and consumers of information increases. You get more gurus, and 
more dummies, as well as more folk in the middle.

As a consequence, productivity tools and intelligent, quasi-wysiwyg editors are 
becoming more and more essential.  The markup is becoming more and more complex. 
People need to be sheilded from it. 

I can see an editor where the writer has just inserted a link to an external RLE 
image. Up pops a dialog box with directories and files, so th efilename gets 
spelled right. Up pops another with some fields for "link text" and "brief 
description" (for the ALT tag) and some checkboxes for image importance: just 
decoration, keep similar, keep exact (for example). It then goes and puts four 
lines of HTML++ gobbledegook into the file to express all this. And talks to the 
revision control system to notify it about the new image file. And so on. People 
are just not going to do all this by hand.
</Side-issue>

Comments from anyone about any part of this are solicited.
--
Chris Lilley
+-----------------------------------------------------------------------------+
| Technical Author, ITTI Computer Graphics and Visualisation Training Project |
+-----------------------------------------------------------------------------+
| Computer Graphics Unit,        |  Internet: C.C.Lilley@mcc.ac.uk            |
| Manchester Computing Centre,   |     Janet: C.C.Lilley@uk.ac.mcc            |
| Oxford Road,                   |     Voice: +44 61 275 6045                 |
| Manchester, UK.  M13 9PL       |       Fax: +44 61 275 6040                 |
| X400:  /I=c/S=lilley/O=manchester-computing-centre/PRMD=UK.AC/ADMD= /C=GB/  |
|  <A HREF="http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html">my page</A>   | 
+-----------------------------------------------------------------------------+