Re: Suggestion: URL string-search syntax

Rick Troth <troth@rice.edu>
Errors-To: listmaster@www0.cern.ch
Date: Sun, 29 May 1994 05:21:46 +0200
Errors-To: listmaster@www0.cern.ch
Message-id: <Pine.3.89.9405271045.A2964-0100000@brazos.is.rice.edu>
Errors-To: listmaster@www0.cern.ch
Reply-To: troth@rice.edu
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Rick Troth <troth@rice.edu>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: Suggestion: URL string-search syntax
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Type: TEXT/PLAIN; charset=US-ASCII
Content-Type: TEXT/PLAIN; charset=US-ASCII
Mime-Version: 1.0
Mime-Version: 1.0
> The suggestion is to extend this syntax to support reference to an 
> arbitrary text string contained within the referenced document.   ... 
 
	I like this idea. 
 
> My suggestion is that #! be reserved as the header for a string to 
> be searched for in this fashion.  So <http://www.ai.mit.edu#!finance> 
> would retrieve the URL <http://www.ai.mit.edu> and then search for 
> the first occurrence of finance.   ... 
 
	I'm not comfortable with your choice of  "!",  though. 
Maybe it's because it  "looks like"  a shell escape.   (not that I'd 
ever advocate  *using*  such a thing)   ;-)   Maybe it's that I might 
want to use  "!"  for negation. 
 
	Also,  what about the  Nth occurance  instead of the first? 
Given that you're considering texts of which you are not the author, 
this might be a really handy addition,  eh? 
 
> ... display that portion of the document (and probably hilight the string, 
> or otherwise indicate where it is).  If not found, I suggest just going 
> to the top of the retrieved document with nothing hilighted.   ... 
 
	Yes.   Nice. 
 
> I also suggest
> allowing spaces in the search string; this works at present in 
> NCSA X Mosaic. Should they be escaped? 
 
	How about quoted? 
 
>    ...  Eventually, I would like to see byte-offset ranges
> available as a way to refer to parts of other documents as well.
 
	Problem:  byte offsets are *not* an interoperable metric. 
Some filesystems don't have a notion of  "this file contains n bytes". 
The number of bytes  PROBABLY CHANGES  as the document goes from server 
host storage (disk) to TCP (on-the-wire).   In many cases,  the  "document" 
is the output of a program and you'd have to hold the whole thing, 
count the bytes  (or at least count up to that point),  and then place 
the pointer.   A better metric would be a line offset or record offset, 
but I suspect that even that isn't suitable for  some case, somewhere. 
 
> ______________________________________________________________________________
> 
> Mark Torrance                                      Tel: (508) 442-0812
> Sun Microsystems Laboratories, Inc.                Fax: (508) 250-5067
> 2 Elizabeth Drive (Mailstop: UCHL03-207)           Net: torrance@east.sun.com
> Chelmsford, MA 01824-4195                          USA
> ______________________________________________________________________________
> 
 
	[disclaimer:  what I'm about to suggest comes from a (perhaps AR) 
POV that quoting should be avoided unless required to resolve ambiguity] 
 
	About the quoting idea:  what if we chose  #"  for your scheme? 
So a URL like  mydoc#myanchor  would work as you'd expect today, 
and a URL like  mydoc#"mytext"  would look for the text as you suggest. 
Maybe someone will suggest another  "quoting"  method  (in which case 
I can throw away my disclaimer above as I embrace the better idea). 
Parenthesis?  Brackets?  Braces? 
 
-- 
Rick Troth <troth@rice.edu>, Rice University, Information Systems