Re: HTML -> ASCII?
dale@ora.com (Dale Dougherty)
From: dale@ora.com (Dale Dougherty)
Message-id: <9311082329.ZM11417@rock.west.ora.com>
Date: Mon, 8 Nov 1993 23:29:32 -0800
In-Reply-To: Charles Henrich <henrich@crh.cl.msu.edu>
"HTML -> ASCII?" (Nov 8, 10:46am)
References: <9311081546.AA29439@crh.cl.msu.edu>
X-Mailer: Z-Mail (2.1.0 10/27/92)
To: Charles Henrich <henrich@crh.cl.msu.edu>, www-talk@nxoc01.cern.ch
Subject: Re: HTML -> ASCII?
The simplest approach is a sed script that removes HTML tags,
that is, anything between a pair of angle brackets.
s/<.[^>]*>//g
You can obviously build more complicated scripts in sed, awk or perl.
The above script will strip out link information because HREF
is an attribute inside the tag.
Such seat-of-the-pants conversions depend on how consistent the
HTML coding is. This is by no means a general solution.
--
Dale Dougherty (dale@ora.com)
Publisher, Global Network Navigator, O'Reilly & Associates, Inc.
103A Morris Street, Sebastopol, California 95472
(707) 829-3762 (home office); 1-800-998-9938