Debian by its numbers, as seen by keyring-maint

Submitted by gwolf on Fri, 07/02/2010 - 14:24

At keyring-maint, we got a request by our DPL, querying for the evolution of the number of keys per keyring – This can be almost-mapped to the number of Debian Developers, Debian Maintainers, retired and deleted accounts over time since the keyrings are maintained over version control.

Stefano insisted this was more out of curiosity than anything else, but given the task seemed easy enough, I came up with the following dirty thingy. I'm sure there are better ways than cruising through the whole Bazaar history, but anyway - In case you want to play, you can clone an almost-up-to-date copy of the tree: bzr clone http://bzr.debian.org/keyring/debian-keyring/

  1. #!/bin/perl
  2. use strict;
  3. my ($lastrev, @keyrings, %revs, $fh);
  4. open $fh, '>growth_stats.txt' or die $!;
  5.  
  6. @keyrings = sort qw(debian-keyring-gpg debian-keyring-pgp
  7. debian-maintainers-gpg
  8. emeritus-keyring-gpg emeritus-keyring-pgp
  9. removed-keys-gpg removed-keys-pgp);
  10.  
  11. system('bzr unbind'); # Huge speed difference :-P
  12. $lastrev = `bzr revno`;
  13.  
  14. for my $entry (split /^---+$/m, `bzr log`) {
  15. my ($rev, $stamp);
  16. for my $line (split(/\n/, $entry)) {
  17. next unless $line =~ /^(revno|timestamp): (.*)/;
  18. $rev = $2 if $1 eq 'revno';
  19. $stamp = $2 if $1 eq 'timestamp';
  20. }
  21. $revs{$rev} = { stamp => $stamp };
  22. }
  23.  
  24. spew('Revision', 'Date', @keyrings);
  25. system('bzr bind')
  26.  
  27. for my $rev (sort {$a<=>$b} keys %revs) {
  28. system("bzr update -r $rev");
  29. spew($rev, $revs{$rev}{stamp},
  30. map {my @keys=<$_/*>;scalar(@keys)} @keyrings);
  31. }
  32.  
  33. sub spew {
  34. print $fh join('|', @_),"\n"
  35. }

And as a result... Yes, I fired up OpenOffice instead of graphing from within Perl, which could even have been less painful ;-) I had intended to leave graphing the data raw (also attached here) as an excercise to the [rl]eader... But anyway, the result is here (click to view it in full resolution, I don't want to mess your reading experience with a >1000px wide image):

A couple of notes:

  • Debian Developers are close to the sum of debian-keyrings-pgp and debian-keyrings-gpg
  • After a long time pestering developers (and you can see how far down the tail we were!), as of today, debian-keyrings-pgp will cease to exist. That means, no more old, v3, vulnerable keys. Yay! All the credit goes to Jonathan. Some last DDs using it are still migrating, but we will get them hopefully soon.
  • To be fair... No, the correct number is not the sum. Some people had more than one key (i.e. when we had ~200 keys in debian-keyring-pgp). The trend is stabilizing.
  • Of course, the {removed-keys,emeritus-keyring}-{pgp,gpg} will continue to grow. Most removed keys are a result of tons of people migrating over from 1024D to stronger 4096R keys
  • You can easily see the points where we have removed inactive developers (i.e. the main WAT lack-of-response, as seen at about ¾ of the graph)
  • keyring-maint handles the Debian Maintainers keyring since late 2009. There is a sensible increase (~10% in six months), although I expected to see that line grow more. I think it's safe to say the rate of influx of DMs is similar to the rate of influx of DDs - Of course, many DMs become DDs, so the amount of new blood might be (almost) the sum of the two positive slopes?

Anyway, have fun with this. Graphics are always fun!

AttachmentSize
Debian by its numbers, as seen by keyring-maint62.79 KB
Raw data31.78 KB
( categories: )
John's picture

Just as an interesting

Just as an interesting experiment, I rewrote the same loop using python and bzrlib directly. Takes about 20s on my machine, and seems to generate the same graph.

  1. import sys
  2.  
  3. keyrings = """debian-keyring-gpg debian-keyring-pgp
  4. debian-maintainers-gpg
  5. emeritus-keyring-gpg emeritus-keyring-pgp
  6. removed-keys-gpg removed-keys-pgp""".split()
  7.  
  8. def process_branch(b, outf):
  9. from bzrlib import ui
  10. last_rev = b.last_revision()
  11. # Only the mainline?
  12. revision_ids = b.revision_history()
  13. revisions = b.repository.get_revisions(revision_ids)
  14. rev_id_to_rev = dict((r.revision_id, r) for r in revisions)
  15. pb = ui.ui_factory.nested_progress_bar()
  16. outf.write('timestamp')
  17. for ring in keyrings:
  18. outf.write('|%s' % (ring,))
  19. outf.write('\n')
  20. for idx, rt in enumerate(b.repository.revision_trees(revision_ids)):
  21. pb.update('processing', idx, len(revisions))
  22. rev_id = rt.get_revision_id()
  23. rev = rev_id_to_rev[rev_id]
  24. outf.write('%d' % (rev.timestamp,))
  25. for ring in keyrings:
  26. file_id = rt.path2id(ring)
  27. if file_id is None:
  28. count = 0
  29. else:
  30. ie = rt.iter_entries_by_dir([file_id]).next()[1]
  31. count = len(ie.children)
  32. outf.write('|%d' % (count,))
  33. outf.write('\n')
  34. pb.finished()
  35.  
  36. def main(args):
  37. import optparse
  38. p = optparse.OptionParser(usage='%prog [options] debhistory outfile',
  39. version='%prog v0.1')
  40. p.add_option('--verbose', action='store_true', help='Be chatty')
  41. (opts,args) = p.parse_args(args)
  42. if len(args) != 2:
  43. p.print_usage()
  44. return 1
  45. from bzrlib import branch, initialize
  46. state = initialize(setup_ui=True)
  47. if state is not None:
  48. state = state.__enter__()
  49. b = branch.Branch.open(args[0])
  50. b.lock_read()
  51. outf = open(args[1], 'wb')
  52. try:
  53. process_branch(b, outf)
  54. finally:
  55. outf.close()
  56. b.unlock()
  57. if state is not None:
  58. state.__exit__(*sys.exc_info())
  59.  
  60. if __name__ == '__main__':
  61. main(sys.argv[1:])

gwolf's picture

Pretty expectable...

Yes, I was thinking on whether I should look into a library to directly work with bzr – Had I thought on publishing it from the beginning, I surely would have ;-) In fact, my first stab was to go through the output of bzr diff -r $foo..$foo+1; bzr is quite slow (at least compared to Git) and I knew this would happen... And yes, we would not have to do all that FS modification.

But hey, for a one-shot work, it works ;-) And it works great, as it got me a faster Python version!