Postgres, Perl, BYTEA and a productive night
I have been working a bit on Comas. Our original implementation allowed only one file of accompanying material per proposal, and I didn’t think it was enough – Yes, an author might tar his different files together… But that’s a kludge. Our current implementation also stores the material as files on disk, not in the database. Now, I want everything to be in the database, as it is much cleaner, much easier to back up, and requires much less hassle for the local administrator. For doing this, I decided to use Postgres’ bytea data type… Now, before RTFM, my uploads were being truncated usually before the 10th byte. Why? Go RTFM:
When entering bytea values, octets of certain values must be escaped (but all octet values may be escaped) when used as part of a string literal in an SQL statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two backslashes. (...)
Ok, so I came up with this hack:
sub _octal_escape {
my ($in, $out);
$in = shift;
# Split the string character by character. Each of the characters will be
# checked, and if it is not a printable character (32-126 base 10), it will
# be escaped. Reassemble the line, hand it over.
$out = join('', map {
my $ord = ord($_);
($ord >= 32 and $ord <= 126 and ord != 92) ? $_ :
sprintf('\%03o', $ord);
} split (//, $in));
return $out;
}
Works correctly… But performance hurts. It took about 90 seconds for a 1.5MB upload, which I consider a typical upload. I thought also of using uuencode/uudecode, which are very nice performance-wise (less than 13 seconds for encoding or decoding over 100MB), and very easy to code in Perl:
sub _uuencode {
my ($in, $out);
$in = shift;
$out = pack('u', $in);
return $out;
}
sub uudecode { my ($in, $out); $in = shift; $out = unpack(‘u’, $in); return $out; }</code> …The problem with that is that I lose many advantages, such as the ability to query the database for length(data) to get the filesize, or –if I _really wanted to do it– perform searches on the contents of the field itself. Also, a bytea field would then become a regular text field… But what hurted most about this idea was that each time someone downloaded a file, it would have to be decoded… And I expect downloads to be much more frequent than uploads. After asking to some friends and some IRC channels, I thought on diving into DBD::Pg. Then I found this:
NOTE: The undocumented (and invalid) support for the "SQL_BINARY" data type is officially deprecated. Use "PG_BYTEA" with "bind_param()" instead: $rv = $sth->bind_param($param_num, $bind_value, { pg_type => DBD::Pg::PG_BYTEA });
So… That 90-second upload is down to a 15-second upload (for a local connection), very acceptable giving that I am moving 1.5MB between my filesystem, Mozilla, Apache, Comas (which is a mod_perl module) and Postgres. And waiting 15 seconds after an upload for a DSL user will not be that unacceptable. I am happy. Now… If just somebody helped me with my blog’s performance, I would be ecstatic… Can’t figure out why it is so goddamn slow… Well, yes - I am running it on a P120 with 48MB RAM… But anyway, it’s just jaws - PHP+MySQL! :-(
Comments
Pablo 2004-11-11 15:44:36
RE: Postgres, Perl, BYTEA and a productive night
Para lo del ‘yaus’:
- Si quitas plugins va a ir un poco más rápido
- Que solo genere RSS y no OPML (aunque el ahorro no lo vas a notar)
- Prueba quitando el calendario
Luego veo que otras cositas pueden ser :-),
saludos!