Hpricot
A Fast, Enjoyable HTML Parser for Ruby
Hpricot is a very flexible HTML parser, based on Tanaka Akira's HTree and John Resig's JQuery, but with the scanner recoded in C (using Ragel for scanning.) I've borrowed what I believe to be the best ideas from these wares to make Hpricot heaps of fun to use.
#!ruby
require 'hpricot'
require 'open-uri'
# load the RedHanded home page
doc = Hpricot(open("http://redhanded.hobix.com/index.html"))
# change the CSS class on links
(doc/"span.entryPermalink").set("class", "newLinks")
# remove the sidebar
(doc/"#sidebar").remove
# print the altered HTML
puts doc
A Proper Start
- InstallingHpricot, both stable and development versions.
- AnHpricotShowcase with recipes for most common things. (Also translated to Japanese.)
- The del.icio.us' hpricot tag is quite active, with a wealth of tutorials and other libs.
The Tougher Things
- Stumped? Ask your question on HpricotChallenge.
- The complete documentation, generated from RDoc.
- Wonder what's happening? Check the CHANGELOG and the Timeline
If you're on a machine with a compiler, you can give the Hpricot quickstart a try: http://balloon.hobix.com/hpricot.
Related Links
- New to Hpricot? Surf on to some enjoyably HpricotTutorials.
- WhoUsesHpricot? Do tell!
- Hpricot Goodies
- Want to parse microformats? See Mofo
The Hpricot Mailing List
To join:
Send a message to hpricot@…
Cc: why@…
#!html <p style="margin-left: 140px;"><strong>Want to follow Hpricot development? <a href="/hpricot.xml"><img src="/camping/chrome/site/images/rss.gif" /></a></strong></p>
