PDA

View Full Version : C Source, Strip HTML


Materialised
06-29-2004, 02:48 AM
Hi everyone, looking for some C source code to enable me to strip html
from a local file, and output to another file.

I figured out how to do a basic strip, i.e simply removing the <tag>'s,
but I am wondering if there is any public source, that can handle table
generation and the '&' tags as well.

Thanks

--
------
Materialised

perl -e 'printf "%silto%c%sck%ccodegurus%corg%c", "ma", 58, "mi", 64,
46, 10;'

Arthur J. O'Dwyer
06-29-2004, 06:02 AM
On Tue, 29 Jun 2004, Materialised wrote: Hi everyone, looking for some C source code to enable me to strip html from a local file, and output to another file. I figured out how to do a basic strip, i.e simply removing the <tag>'s, but I am wondering if there is any public source, that can handle table generation and the '&' tags as well.

lynx -dump -nolist

Lynx Version 2.8.4rel.1 (17 Jul 2001)
libwww-FM 2.14, SSL-MM 1.4.1, OpenSSL 0.9.6k
Built on linux-gnu Nov 18 2003 16:52:57

This version of Lynx is absolutely awful at tables; maybe tables
look a *little* better in a different version? It might pay to check
out Mozilla, too --- good textmode table-rendering seems like
something that might be quite alluring to open-source hackers, so
maybe something got wedged into that browser to handle textmode.

In principle, &amp;-style entities are just a table lookup; make
a list of all the entities and their corresponding texts. Tables
are deep magic, typographically speaking, so if you want original
source for dealing with tables, you'll need to be very detailed about
your requirements.

HTH,
-Arthur

Materialised
06-29-2004, 07:35 AM
Arthur J. O'Dwyer wrote: On Tue, 29 Jun 2004, Materialised wrote:Hi everyone, looking for some C source code to enable me to strip htmlfrom a local file, and output to another file.I figured out how to do a basic strip, i.e simply removing the <tag>'s,but I am wondering if there is any public source, that can handle tablegeneration and the '&' tags as well. lynx -dump -nolist Lynx Version 2.8.4rel.1 (17 Jul 2001) libwww-FM 2.14, SSL-MM 1.4.1, OpenSSL 0.9.6k Built on linux-gnu Nov 18 2003 16:52:57 This version of Lynx is absolutely awful at tables; maybe tables look a *little* better in a different version? It might pay to check out Mozilla, too --- good textmode table-rendering seems like something that might be quite alluring to open-source hackers, so maybe something got wedged into that browser to handle textmode. In principle, &amp;-style entities are just a table lookup; make a list of all the entities and their corresponding texts. Tables are deep magic, typographically speaking, so if you want original source for dealing with tables, you'll need to be very detailed about your requirements. HTH, -Arthur
Thanks for your reply Arthur.
I'm downloading the lynx source now, and i'll look over it.
I'll post what I find.


--
------
Materialised

perl -e 'printf "%silto%c%sck%ccodegurus%corg%c", "ma", 58, "mi", 64,
46, 10;'


MyLounge.com Site Map
Forum: Cars, Cell Phone, Database, Games, Home Improvement, IT, Music, School, Sports, Web Design, Web Server, Weight Loss

The MyLounge.com forum is intended for informational use only and should not be relied upon and is not a substitute for any advice. The information contained on MyLounge.com are opinions and suggestions of members and is not a representation of the opinions of MyLounge.com. MyLounge.com does not warrant or vouch for the accuracy, completeness or usefulness of any postings or the qualifications of any person responding. Please consult a expert or seek the services of an attorney in your area for more accuracy on your specific situation. Please note that our forums also serve as mirrors to Usenet newsgroups. Many posts you see on our forums are made by newsgroup users who may not be members of MyLounge.com Term of Service