[PyKDE] Creating a DOM from HTML source with PyKDE

Andreas Pakulat apaku at gmx.de
Wed Jun 7 10:40:06 BST 2006


On 07.06.06 10:30:29, Robin Haswell wrote:
> I'm trying to use PyKDE to grab a DOM from HTML source for a
> privoxy-like project. I was wondering if anyone could get me started on
> this? I'm using PyKDE because I'm struggling to find an HTML parser I
> can use which won't choke on the majority of HTML out there.
> BeautifulSoup is unsuitable and I shudder at building PyXPCOM.

You could try lxml, they use libxml2's HTMLParser which shouldn't choke
on the majority of bad HTML out there.

Using Qt's DOM won't help you with bad HTML, because the parsers work
only on XML.

> Oh I take it I don't have to have an X server running to use khtml, is
> that true?

AFAIK: No, you need a KApplication object for the signal/slot stuff to
work. KApplication (based on QApplication) in Qt3/KDE3 cannot be used
without a running X server.

Andrea

-- 
Afternoon very favorable for romance.  Try a single person for a change.




More information about the PyQt mailing list