Ticket #88 (new defect)
Hpricot is unable to find a div for some pages of the same site
| Reported by: | scrubber | Owned by: | why |
|---|---|---|---|
| Priority: | major | Milestone: | 0.6 |
| Component: | ext/hpricot_scan | Version: | |
| Keywords: | Cc: |
Description
Check out this code:
require 'rubygems'
require 'hpricot'
require 'open-uri'
#a working page
#doc = Hpricot(open('http://www.handango.com/PlatformProductDetail.jsp?siteId=1&osId=322&jid=5898CAFFB9E872CAA57847B6862AEX58&platformId=5&N=4294966622&R=121718&productId=121718'))
#a broken page
doc = Hpricot(open('http://www.handango.com/PlatformProductDetail.jsp?siteId=1&osId=322&jid=43BF1722D46DX2AFB2E166BE81ECE822&platformId=5&N=4294966622&R=171552&productId=171552'))
records = doc/"//div[@id='detailTabs']"
p records[0].inner_html
The output is
"\r\n"
even though the div is there and it contains a chunk of HTML.
Uncomment the first doc = ... line (and comment out the second :-)) and you will see how should it work. I have this behavior for about the 15% of the pages of the same site.
Change History
Note: See
TracTickets for help on using
tickets.
