Search this site:

Postgres, Perl, BYTEA and a productive night

I have been working a bit on Comas. Our original implementation allowed only one file of accompanying material per proposal, and I didn’t think it was enough – Yes, an author might tar his different files together… But that’s a kludge. Our current implementation also stores the material as files on disk, not in the database. Now, I want everything to be in the database, as it is much cleaner, much easier to back up, and requires much less hassle for the local administrator. For doing this, I decided to use Postgres’ bytea data type… Now, before RTFM, my uploads were being truncated usually before the 10th byte. Why? Go RTFM:

When entering bytea values, octets of certain values must be escaped (but all octet values may be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes. (...)

Ok, so I came up with this hack: sub _octal_escape { my ($in, $out); $in = shift; # Split the string character by character. Each of the characters will be # checked, and if it is not a printable character (32-126 base 10), it will # be escaped. Reassemble the line, hand it over. $out = join('', map { my $ord = ord($_); ($ord >= 32 and $ord <= 126 and ord != 92) ? $_ : sprintf('\%03o', $ord); } split (//, $in)); return $out; } Works correctly… But performance hurts. It took about 90 seconds for a 1.5MB upload, which I consider a typical upload. I thought also of using uuencode/uudecode, which are very nice performance-wise (less than 13 seconds for encoding or decoding over 100MB), and very easy to code in Perl: sub _uuencode { my ($in, $out); $in = shift; $out = pack('u', $in); return $out; }

sub uudecode { my ($in, $out); $in = shift; $out = unpack(‘u’, $in); return $out; }</code> …The problem with that is that I lose many advantages, such as the ability to query the database for length(data) to get the filesize, or –if I _really wanted to do it– perform searches on the contents of the field itself. Also, a bytea field would then become a regular text field… But what hurted most about this idea was that each time someone downloaded a file, it would have to be decoded… And I expect downloads to be much more frequent than uploads. After asking to some friends and some IRC channels, I thought on diving into DBD::Pg. Then I found this:

NOTE: The undocumented (and invalid) support for the "SQL_BINARY" data type is officially deprecated. Use "PG_BYTEA" with "bind_param()" instead: $rv = $sth->bind_param($param_num, $bind_value, { pg_type => DBD::Pg::PG_BYTEA });

So… That 90-second upload is down to a 15-second upload (for a local connection), very acceptable giving that I am moving 1.5MB between my filesystem, Mozilla, Apache, Comas (which is a mod_perl module) and Postgres. And waiting 15 seconds after an upload for a DSL user will not be that unacceptable. I am happy. Now… If just somebody helped me with my blog’s performance, I would be ecstatic… Can’t figure out why it is so goddamn slow… Well, yes - I am running it on a P120 with 48MB RAM… But anyway, it’s just jaws - PHP+MySQL! :-(


Pablo 2004-11-11 15:44:36

RE: Postgres, Perl, BYTEA and a productive night

Para lo del ‘yaus’:

Luego veo que otras cositas pueden ser :-),