Ticket #77 (new defect)

Opened 16 months ago

Last modified 15 months ago

XPath finds bogusetag in case of unescaped double-quotes

Reported by: scrubber Owned by: why
Priority: major Milestone: 0.6
Component: ext/hpricot_scan Version:
Keywords: Cc:

Description

This snippet:

require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open('http://www.funmobile.com/catalog/ringtone/polyringtones/catId-00201340'))
records = doc/'/html/body/div/div/table/tr/td/table/tr/td[2]/a[1]'
records.map[14..16].each {|e| p e}

Shows that though the above XPath should find 20 records, it finds just 19. The missing one (the 15th in the records array) should be matched by the XPath, too (in Firefox it is, for example) but it is not - it is returned as a bogusetag.

Is this a serious problem? Just because a project is depending on this, and it would be super great to know what is the estimate for fixing it. Thanks a lot!

Change History

Changed 15 months ago by lwu

Looks like the root cause seems to be bad HTML on their end:

   title="Cell Phone Polyphonic Ringtones - Polyphonic Ringtones - Themes (Movies, TV) Polyphonic Ringtones - Going The Distance (From "Rocky") - Theme"

I haven't looked much at Hpricot's error handling code, but you could probably work around this ("Rocky"!) temporarily by escaping those " characters with " yourself, before sending the doc to Hpricot.

Changed 15 months ago by lwu

  • summary changed from XPath finds a bogusetag instead of a normal element to XPath finds bogusetag in case of unescaped double-quotes
Note: See TracTickets for help on using tickets.