Re: HTML -> ASCII?

dale@ora.com (Dale Dougherty)
From: dale@ora.com (Dale Dougherty)
Message-id: <9311082329.ZM11417@rock.west.ora.com>
Date: Mon, 8 Nov 1993 23:29:32 -0800
In-Reply-To: Charles Henrich <henrich@crh.cl.msu.edu>
        "HTML -> ASCII?" (Nov  8, 10:46am)
References: <9311081546.AA29439@crh.cl.msu.edu>
X-Mailer: Z-Mail (2.1.0 10/27/92)
To: Charles Henrich <henrich@crh.cl.msu.edu>, www-talk@nxoc01.cern.ch
Subject: Re: HTML -> ASCII?
The simplest approach is a sed script that removes HTML tags,
that is, anything between a pair of angle brackets.

s/<.[^>]*>//g


You can obviously build more complicated scripts in sed, awk or perl.
The above script will strip out link information because HREF 
is an attribute inside the tag.   

Such seat-of-the-pants conversions depend on how consistent the 
HTML coding is.  This is by no means a general solution.

-- 
Dale Dougherty (dale@ora.com) 
Publisher, Global Network Navigator, O'Reilly & Associates, Inc.
103A Morris Street, Sebastopol, California 95472 
(707) 829-3762 (home office); 1-800-998-9938