Tue, 15 Apr 2008
Last month we had a brief discussion on debian-devel about what images would be good to have for lenny – we’re apparently up to about 30 CDs or 4 DVDs per architecture, which over 12 architectures adds to about 430GB in total. That’s a lot, given it’s only one release, and meanwhile the entire Debian archive is only 324GB.
The obvious way to avoid that is to make use of jigdo – which lets you recreate an iso from a small template and the existing Debian mirror network. I’ve personally never used jigdo much, half because I don’t usually use isos anyway, but also because the few times I have tried jigdo it always seemed really unnecessarily slow. So the other day I tried writing my own jigdo download tool focussed on making sure it was as fast as possible.
The official jigdo download tool, ttbomk, is jigdo-lite – which you give a .jigdo file, and the url of a local mirror. It then downloads the first ten files using wget, and once they’re all downloaded, it calls jigdo-file to get them merged into the output image. This gets repeated until all the files have been downloaded.
By doing the download in sequence like this, you miss out on using your full network connection in two ways: one during the connection setup latency when starting to download the next package, and also while jigdo-lite stops downloading to run jigdo-file. And if you’ve got a fast download link, but a slower CPU or disk, you can also find yourself constrained in that you’re maxing those out while running jigdo-file, but leaving them more or less idle while downloading.
To avoid this, you want to do multiple things at once: most importantly, to be writing data to the image at the same time as you’re downloading more data. With jigdodl (the name I’ve given to my little program), I went a little bit overboard, and made it not only do that, but also manage four downloads and the decompression of the raw data from the template. That’s partly due to not being entirely sure what needed to be done to get a speedy jigdo program, and partly because the communicate module I’d just written to deal with this sort of parallelism making that somewhat natural.
In the end, it works: from wireless over ADSL to my ISP’s Debian mirror, I get the following output:
Jigsaw download: Filename: debian-40r3-amd64-CD-1.iso Length: 675477504 MD5sum: d3924cdaceeb6a3706a6e2136e5cfab2 Total: 679 s; d/l: 586 MB at 883 kB/s; dump: 57 MB at 57 MB/s Finished!
which is only slightly short of maxing out my downstream bandwidth, taking a total of about 11m20s. Running jigdodl with a closer mirror works pretty well too, though evidently some of my more recent changes weren’t so great, because I’ve gone from 9153 kB/s on a 100 Mbps link down to 7131 kB/s or lower. The CPU usage also seems a bit high, hovering at between five to ten percent at 900 kB/s.
For comparison, running jigdo-lite on the same file took 17m41s, which is about 566 kB/s, with the overhead being about 6m20s. What that means is if I doubled my bandwidth to about 20Mbps, jigdodl would halve its time for the download to about 5m50s, while jigdo-lite would still have about the same non-download overhead, and thus take 12m10, which is still 69% of its original speed. Going from 10Mbps ADSL speed to 100Mbps LAN gets jigdodl down to 1m31s (13% of the time, with optimal being 10%), while jigdo-lite would be expected to still be about 7m51s (43% of its original time).
I suspect the next thing to do is to rewrite the downloading code to use python-curl instead of running curl, and thus downloading multiple files with a single connection, and tweaking the code so that it writes the file in order, rather than updating whichever parts are ready first.
Anyway, debs are available for anyone who wants to try it out, along with source in the new git source package format.
In a couple of days, DPL-elect Steve McIntyre takes over as DPL, after being elected by around four hundred of his peers… Because I can’t help myself, I thought I might poke at election numbers and see if anything interesting fell out.
First the basics: I get the same results as the official ones when recounting the vote. Using first-past-the-post, Steve wins with 147 first preference votes against Raphael’s 124, Marc’s 90 and NOTA’s 19 (with votes that specify a tie for first dropped). Using instant-runoff / single transferable vote, the winner is also Steve, with NOTA elimited first and Marc collecting collecting 5 votes, Steve 4 and Raphael 2, followed by Marc getting eliminated with Steve collecting 50 votes, against Raphael’s 26.
So, as usual, different voting systems would have given the same result, presuming people voted in basically the same way.
NOTA really didn’t fare well at all in this election, with a majority of voters ranking it beneath all candidates (268 of 401, 53.5%). For comparison, only 18 voters ranked all candidates beneath NOTA, with 9 of those voters then ranking all candidates equally. (For comparison, in 2007, 312 of 482 voters (about 65%) ranked some candidate below NOTA, though that drops to 225 voters (47%) if you ignore voters that just left some candidates unranked. Only 98 voters (20%) voted every candidate above NOTA)
With NOTA excluded from consideration, things simplify considerably, with only 13 possible different votes remaining. Those come in four categories: ranking everyone equal (17 votes, 9 below NOTA as mentioned above, and 8 above NOTA), ranking one candidate below the others (13 votes total, 7 ranking Raphael last, 3 each for Steve and Marc), ranking one candidate above the others (66 votes; 30 ranking Steve first, 18 each ranking Raphael and Marc first), and the remainder with full preferences between the candidates:
70 V: 213
63 V: 123
56 V: 132
52 V: 231
38 V: 312
26 V: 321
The most interesting aspect of that I can see is that of the people who ranked Raphael first, there was a 1.8:1 split in preferring Steve to Marc, and for those who preferred Marc first, there was a 2:1 split preferring Steve to Raphael. For those who preferred Steve, there was only a 1.1:1 split favouring Raphael over Marc.
I think it’s fair to infer from that that not only was Steve the preferred candidate overall, but that he’s considered a good compromise canidate for supporters of both the alternative candidates (though if all the people who ended up supporting Steve hadn’t been voting, Raphael would have won by something like 26 votes (129:103) with a 1.25:1 majority; if they had been voting, but Steve hadn’t been a candidate, Raphael’s margin would’ve increased absolutely to 33 votes (192:159) but decreased in ratio to a 1:1.2 majority.
