Source format 3.0 (quilt) for teh win!

Submitted by gwolf on Wed, 11/25/2009 - 14:26

My Debian QA page shows what I consider to be a huge amount of packages — I am currently uploader for 207 packages. Why so many? There are many factors — The main one is group maintenance (I'm directly responsible only for 19; of course, this should not mean I disregard the rest of them), the second one is regularity. By far, most of my source packages (177) match lib.*perl, followed by lib.*ruby with 20.

Anyway — A strong factor that allows the pkg-perl group to be successful in maintaining 1411 packages is the regularity of the task: Packaging Perl modules is usually as easy as running dh-make-perl on them (of course, not taking away the merit of packaging the few strange corner cases…

In Ruby-land, the landscape is quite different. The developer community is quite anchored in agile worldviews, which go beyond coding practices and all the way over to confronting the way most Free Software projects distribute their work. I have previously ranted presented informed and opinionated blog posts on this topic — Ruby culture dictates the distribution via Ruby Gems, which are for many reasons not Debian friendly. Besides Gems, most projects have adopted Git for development tracking and are hosted under Github — That's why I came up with Githubredir, which basically presents an uscan-friendly listing of tags for a given project.

But if you develop in Git, you might want to split a project in its constituent parts for easier organization, without meaning that each subproject should be an independent project by itself, right? After all, that's what Git submodules are for. That's what happened with a great PDF generating library for Ruby, Prawn. Thing is, the three parts of the main project are required for the project to be built.

Anyway, that was a great reason to move the package over to the new dpkg 3.0 (quilt) source format. And, yes, it is a straightforward move! If you have not yet done so, take a look at Raphael Hertzog' explanation+FAQ wiki page. It just works, and makes many things way easier.

There are still some wrinkles in my packaging, like where I'm getting the orig tarballs from — As the submodules are not presently tagged in any way, I was only able to download a snapshot of their respective current master branches. This is suboptimal, I know, but I have talked to the upstream author, and he confirms that for the next major version (which should not be long in coming) the tags will be synchronized, and things will be even cleaner.

PS- I love Hpricot. To get the numbers for my QA page, I just had to get three dirty but useful arrays:

  1. require 'hpricot'
  2. require 'open-uri'
  3. url = ''
  4. doc = Hpricot(open(url))
  5. tot = ( (doc / 'table' )[1]/'tr').map { |row|
  6. (row%'td' % 'a').inner_html rescue nil}.select {|i| i}
  7. team = ( (doc / 'table' )[1]/'tr').map { |row|
  8. (row%'td' % 'span.uploader' % 'a').inner_html rescue nil}.
  9. select {|i| i}
  10. mine = ( (doc / 'table' )[1]/'tr').map { |row|
  11. (row%'td' % 'span:not(.uploader)' % 'a').inner_html rescue nil}.
  12. select {|i| i}

And work from the three very simple lists there — i.e. {|pkg| pkg =~ /lib.*perl/}.size gives me 177.

( categories: )
Tobias's picture

Hpricot alternative

If you love Hpricot, you might want to try Nokogiri (, a replacement based on the stable and well-tested code of libxml2 and libxslt1.1 :)

gwolf's picture

Briefly played with it...

I recently played briefly with Nokogiri on another spidering snippet I wrote recently... I ended up staying with Hpricot as it is... better known to me ;-) Thanks, I'll read a bit more on it.