Ticket #106 (new defect)

Opened 10 months ago

Last modified 6 months ago

Hpricot can't handle some files that are multiples of 16384 bytes

Reported by: bien Owned by: why
Priority: major Milestone:
Component: ext/hpricot_scan Version:
Keywords: Cc:

Description

I tried to parse an xml file of exactly 65535 bytes with this code:

$ cat bug.rb
require 'rubygems'
require 'hpricot'

File.open(ARGV[0]) do |f|
  Hpricot(f)
end

And I get this error:

$ ruby bug.rb text4.txt
Oh no: read returned Qnil!
/p/lib/gems/hpricot-0.6.155/lib/hpricot/parse.rb:52:in `scan': can't convert nil into String (TypeError)
        from /p/lib/gems/hpricot-0.6.155/lib/hpricot/parse.rb:52:in `make'
        from /p/lib/gems/hpricot-0.6.155/lib/hpricot/parse.rb:15:in `parse'
        from /p/lib/gems/hpricot-0.6.155/lib/hpricot/parse.rb:4:in `Hpricot'
        from bug.rb:5
        from bug.rb:4:in `open'
        from bug.rb:4

Actually, I hacked hpricot_scan.rl by checking in hpricot_scan() if the funcall to s_read returns Qnil, and outputting the Oh no message.

Change History

Changed 10 months ago by bien

Here's a patch that fixes the problem:

===================================================================
--- ext/hpricot_scan/hpricot_scan.rl    (revision 155)
+++ ext/hpricot_scan/hpricot_scan.rl    (working copy)
@@ -181,6 +181,11 @@
       str = rb_str_substr( port, nread, space );
     }
 
+    if (NIL_P(str)) 
+    {
+      break;
+    } 
+
     StringValue(str);
     memcpy( p, RSTRING(str)->ptr, RSTRING(str)->len );
     len = RSTRING(str)->len;

I tried to attach a test file that breaks hpricot, but the file upload mechanism seems to be broken.

Changed 6 months ago by russm

FYI, this is also occurring for me in a file of 16384 bytes (saved locally from http://www.metafilter.com/23301/index.html). If I prepend (or append) a single "\n" to the file it doesn't error.

Changed 6 months ago by russm

OK... I've seen this "in `scan': can't convert nil into String (TypeError?)" on a number of files, all of which have sizes that are a multiple of 16384 bytes, but not all files that are multiples of this size trigger the error. Files that trigger the error can be "fixed" by changing their size. So far, as well as the page linked above, this has occurred on http://www.metafilter.com/26569/index.html and http://www.metafilter.com/27965/index.html.

Changed 6 months ago by russm

  • summary changed from Hpricot can't handle files of exactly 65535 bytes to Hpricot can't handle some files that are multiples of 16384 bytes
Note: See TracTickets for help on using tickets.