Postgres, Perl, BYTEA and a productive night

Submitted by gwolf on Thu, 11/11/2004 - 11:02
I have been working a bit on [term]Comas[/term]. Our original implementation allowed only one file of accompanying material per proposal, and I didn't think it was enough -- Yes, an author might tar his different files together... But that's a kludge. Our current implementation also stores the material as files on disk, not in the database. Now, I want _everything_ to be in the database, as it is much cleaner, much easier to back up, and requires much less hassle for the local administrator. For doing this, I decided to use Postgres' bytea data type... Now, before [term]RTFM[/term], my uploads were being truncated usually before the 10th byte. Why? Go RTFM:
When entering bytea values, octets of certain values must be escaped (but all octet values may be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes. (...)
Ok, so I came up with this hack: [code="perl"]sub _octal_escape { my ($in, $out); $in = shift; # Split the string character by character. Each of the characters will be # checked, and if it is not a printable character (32-126 base 10), it will # be escaped. Reassemble the line, hand it over. $out = join('', map { my $ord = ord($_); ($ord >= 32 and $ord <= 126 and ord != 92) ? $_ : sprintf('\%03o', $ord); } split (//, $in)); return $out; }[/code] Works correctly... But performance hurts. It took about 90 seconds for a 1.5MB upload, which I consider a typical upload. I thought also of using uuencode/uudecode, which are very nice performance-wise (less than 13 seconds for encoding or decoding over 100MB), and very easy to code in Perl: [code="perl"]sub _uuencode { my ($in, $out); $in = shift; $out = pack('u', $in); return $out; } sub _uudecode { my ($in, $out); $in = shift; $out = unpack('u', $in); return $out; }[/code] ...The problem with that is that I lose many advantages, such as the ability to query the database for length(data) to get the filesize, or --if I _really_ wanted to do it-- perform searches on the contents of the field itself. Also, a bytea field would then become a regular text field... But what hurted most about this idea was that each time someone downloaded a file, it would have to be decoded... And I expect downloads to be _much_ more frequent than uploads. After asking to some friends and some IRC channels, I thought on diving into DBD::Pg. Then I found this:
NOTE: The undocumented (and invalid) support for the "SQL_BINARY" data type is officially deprecated. Use "PG_BYTEA" with "bind_param()" instead: $rv = $sth->bind_param($param_num, $bind_value, { pg_type => DBD::Pg::PG_BYTEA });
So... That 90-second upload is down to a 15-second upload (for a local connection), very acceptable giving that I am moving 1.5MB between my filesystem, Mozilla, Apache, Comas (which is a mod_perl module) and Postgres. And waiting 15 seconds after an upload for a DSL user will not be _that_ unacceptable. I am happy. Now... If just somebody helped me with my blog's performance, I would be ecstatic... Can't figure out why it is so goddamn slow... Well, yes - I am running it on a P120 with 48MB RAM... But anyway, it's just [term]jaws[/term] - PHP+MySQL! :-(
( categories: )
Pablo's picture

RE: Postgres, Perl, BYTEA and a productive night

Para lo del 'yaus': * Si quitas plugins va a ir un poco más rápido * Que solo genere RSS y no OPML (aunque el ahorro no lo vas a notar) * Prueba quitando el calendario Luego veo que otras cositas pueden ser :-), saludos!