Search

Search this site:

Debian by its numbers, as seen by keyring-maint

At keyring-maint, we got a request by our DPL, querying for the evolution of the number of keys per keyring – This can be almost-mapped to the number of Debian Developers, Debian Maintainers, retired and deleted accounts over time since the keyrings are maintained over version control.

Stefano insisted this was more out of curiosity than anything else, but given the task seemed easy enough, I came up with the following dirty thingy. I’m sure there are better ways than cruising through the whole Bazaar history, but anyway - In case you want to play, you can clone an almost-up-to-date copy of the tree: bzr clone http://bzr.debian.org/keyring/debian-keyring/

#!/bin/perl use strict; my ($lastrev, @keyrings, %revs, $fh); open $fh, '>growth_stats.txt' or die $!;

@keyrings = sort qw(debian-keyring-gpg debian-keyring-pgp debian-maintainers-gpg emeritus-keyring-gpg emeritus-keyring-pgp removed-keys-gpg removed-keys-pgp);

system(‘bzr unbind’); # Huge speed difference :-P $lastrev = bzr revno;

for my $entry (split /^—+$/m, bzr log) { my ($rev, $stamp); for my $line (split(/\n/, $entry)) { next unless $line =~ /^(revno|timestamp): (.*)/; $rev = $2 if $1 eq ‘revno’; $stamp = $2 if $1 eq ‘timestamp’; } $revs{$rev} = { stamp => $stamp }; }

spew(‘Revision’, ‘Date’, @keyrings); system(‘bzr bind’)

for my $rev (sort {$a<=>$b} keys %revs) { system(“bzr update -r $rev”); spew($rev, $revs{$rev}{stamp}, map {my @keys=<$_/*>;scalar(@keys)} @keyrings); }

sub spew { print $fh join(‘|’, @_),”\n” } </code>

And as a result… Yes, I fired up OpenOffice instead of graphing from within Perl, which could even have been less painful ;-) I had intended to leave graphing the data raw (also attached here) as an excercise to the [rl]eader… But anyway, the result is here (click to view it in full resolution, I don’t want to mess your reading experience with a >1000px wide image):

A couple of notes:

Anyway, have fun with this. Graphics are always fun!

Attachments

Debian by its numbers, as seen by keyring-maint (63 KB)

Raw data (32 KB)

Comments

gwolf 2010-07-02 14:28:00

Pretty expectable…

Yes, I was thinking on whether I should look into a library to directly work with bzr – Had I thought on publishing it from the beginning, I surely would have ;-) In fact, my first stab was to go through the output of bzr diff -r $foo..$foo+1; bzr is quite slow (at least compared to Git) and I knew this would happen… And yes, we would not have to do all that FS modification.

But hey, for a one-shot work, it works ;-) And it works great, as it got me a faster Python version!


John 2010-07-02 14:12:00

Just as an interesting

Just as an interesting experiment, I rewrote the same loop using python and bzrlib directly. Takes about 20s on my machine, and seems to generate the same graph.

import sys

keyrings = “"”debian-keyring-gpg debian-keyring-pgp debian-maintainers-gpg emeritus-keyring-gpg emeritus-keyring-pgp removed-keys-gpg removed-keys-pgp””“.split()

def process_branch(b, outf): from bzrlib import ui last_rev = b.last_revision() # Only the mainline? revision_ids = b.revision_history() revisions = b.repository.get_revisions(revision_ids) rev_id_to_rev = dict((r.revision_id, r) for r in revisions) pb = ui.ui_factory.nested_progress_bar() outf.write(‘timestamp’) for ring in keyrings: outf.write(‘|%s’ % (ring,)) outf.write(‘\n’) for idx, rt in enumerate(b.repository.revision_trees(revision_ids)): pb.update(‘processing’, idx, len(revisions)) rev_id = rt.get_revision_id() rev = rev_id_to_rev[rev_id] outf.write(‘%d’ % (rev.timestamp,)) for ring in keyrings: file_id = rt.path2id(ring) if file_id is None: count = 0 else: ie = rt.iter_entries_by_dir([file_id]).next()[1] count = len(ie.children) outf.write(‘|%d’ % (count,)) outf.write(‘\n’) pb.finished()

def main(args): import optparse p = optparse.OptionParser(usage=’%prog [options] debhistory outfile’, version=’%prog v0.1’) p.add_option(‘–verbose’, action=’store_true’, help=’Be chatty’) (opts,args) = p.parse_args(args) if len(args) != 2: p.print_usage() return 1 from bzrlib import branch, initialize state = initialize(setup_ui=True) if state is not None: state = state.enter() b = branch.Branch.open(args[0]) b.lock_read() outf = open(args[1], ‘wb’) try: process_branch(b, outf) finally: outf.close() b.unlock() if state is not None: state.exit(*sys.exc_info())

if name == ‘main’: main(sys.argv[1:]) </code>

Categories