Supported XPath Expressions

Hpricot gets its XPath support from JQuery, so much of what's here is straight from JQuery's XPath docs.

Here are some samples:

 #!ruby
 require 'hpricot'
 require 'open-uri'
 doc = Hpricot(URI.parse("http://google.com/").read)

 doc.search("/html/body//p")
 doc.search("//p")
 doc.search("//p/a")
 doc.search("//a[@src]")
 doc.search("//a[@src='google.com']")

Location Paths

Absolute Paths

 #!ruby
 doc.search("/html/body//p")
 doc.search("/*/body//p")
 doc.search("//p/../div")

Relative Paths

 #!ruby
 doc.search("a",this)
 doc.search("p/a",this)

Supported Axes

descendant

Element has a descendant element.

 #!ruby
 doc.search("//div/descendant::p")

Identical to doc.search("//div//p").

child

Element has a child element.

 #!ruby
 doc.search("//div/child::p")

Which is identical to: doc.search("//div/p").

preceding-sibling

Element has an element before it, on the same axes.

 #!ruby
 doc.search("//div/preceding-sibling::form")

parent

Selects the parent element of the element

 #!ruby
 doc.search("//div/parent::div")

Which is identical to doc.search("//div/../div").

self

Selects the element itself.

Supported Predicates

  • [@*] Has an attribute
        #!ruby
        doc.search("//div[@*]")`)
    
  • [@foo] Has an attribute of foo
        #!ruby
        doc.search("//input[@checked]")`)
    
  • [@foo='test'] Attribute foo is equal to test
        #!ruby
        doc.search("//a[@ref='nofollow']")`)
    
  • [ Nodelist] Element contains a node list, for example:
        #!ruby
        doc.search("//div[p]")
        doc.search("//div[p/a]")
    

Supported Predicates, but differently

  • [last()] or [position()=last()] becomes :last
        #!ruby
        doc.search("p:last")`)
    
  • [ 0] or [position()=0] becomes :eq(0) or :first
        #!ruby
        doc.search("p:first")
        doc.search("p:eq(0)")
    
  • [position() < 5] becomes :lt(5)
        #!ruby
        doc.search("p:lt(5)")`)
    
  • [position() > 2] becomes :gt(2)
        #!ruby
        doc.search("p:gt(2)")`)