Hpricot Fixups
Part of AnHpricotShowcase.
:fixup_tags
Really, there are so many ways to clean up HTML and your intentions may be to keep the HTML as-is. So Hpricot's default behavior is to keep things flexible. Making sure to open and close all the tags, but ignore any validation problems.
As of Hpricot 0.4, there's a new :fixup_tags option which will attempt to shift the document's tags to meet XHTML 1.0 Strict.
#!ruby
doc = open("index.html") { |f| Hpricot f, :fixup_tags => true }
This doesn't quite meet the XHTML 1.0 Strict standard, it just tries to follow the rules a bit better. Like: say Hpricot finds a paragraph in a link, it's going to move the paragraph below the link. Or up and out of other elements where paragraphs don't below.
If an unknown element is found, it is ignored. Again, :fixup_tags.
:xhtml_strict
So, let's go beyond just trying to fix the hierarchy. The :xhtml_strict option really tries to force the document to be an XHTML 1.0 Strict document. Even at the cost of removing elements that get in the way.
#!ruby
doc = open("index.html") { |f| Hpricot f, :xhtml_strict => true }
What measures does :xhtml_strict take?
- Shift elements into their proper containers just like :fixup_tags.
- Remove unknown elements.
- Remove unknown attributes.
- Remove illegal content.
- Alter the doctype to XHTML 1.0 Strict.
Return to AnHpricotShowcase.
