Hpricot Fixups

Part of AnHpricotShowcase.

:fixup_tags

Really, there are so many ways to clean up HTML and your intentions may be to keep the HTML as-is. So Hpricot's default behavior is to keep things flexible. Making sure to open and close all the tags, but ignore any validation problems.

As of Hpricot 0.4, there's a new :fixup_tags option which will attempt to shift the document's tags to meet XHTML 1.0 Strict.

 #!ruby
 doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }

This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow the rules a bit better. Like: say Hpricot finds a paragraph in a link, it's going to move the paragraph below the link. Or up and out of other elements where paragraphs don't below.

If an unknown element is found, it is ignored. Again, :fixup_tags.

:xhtml_strict

So, let's go beyond just trying to fix the hierarchy. The :xhtml_strict option really tries to force the document to be an XHTML 1.0 Strict document. Even at the cost of removing elements that get in the way.

 #!ruby
 doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }

What measures does :xhtml_strict take?

  1. Shift elements into their proper containers just like :fixup_tags.
  2. Remove unknown elements.
  3. Remove unknown attributes.
  4. Remove illegal content.
  5. Alter the doctype to XHTML 1.0 Strict.

Return to AnHpricotShowcase.