Sat, 26 Nov 2005
The biggest difference between the hypothetical security queues and the existing byhand and new queues is that they need to be autobuilt; it’s not really feasible to try saying whether a security update is acceptable if there’s still some possibility it’s not going to build on half our architectures. Additionally, it means that there are two queues that need to be built, rather than just one.
The way accepted autobuilding works isn’t too complicated, fortunately. There’s a directory where symlinks are made to uploaded packages and sources, and a Packages files (for all architectures) is generated, as is a Sources file, all of which are made available to buildds. They’re also added to the regular unstable Packages and Sources files and passed to quinn-diff and wanna-build. In order to avoid unnecessary breakage at transitional points (such as when a package moves from accepted to unstable proper), packages are kept in the accepted autobuild directory for a day or so afterwards. There’s also the issue of including the .orig.tar.gz from the archive along with the .dsc and .diff.gz from accepted for uploads that aren’t new upstream versions.
Security accepted autobuilding adds two extra wrinkles. The first is that packages being autobuilt can’t be generally visible to other users on the security host, which includes the web server that’s supposed to get the packages to the security buildds. So instead of symlinking, copies have to be made, and care has to be taken to ensure that doesn’t accidently leak information in the process. The second trick is that security needs autobuilding of multiple suites (oldstable and stable, and hopefully testing as well), whereas for the regular archive, it’s sufficient to autobuild unstable packages from accepted, and leave the rest until the daily pulse run. This particularly means that uploads from accepted need to be moved into separate directories depending on the target suite, so that the buildds don’t accidently build a package for woody against the libc from sarge.
At a gross level, generalising to supporting autobuilding of multiple queues is therefore straightforward: extend the “accepted_autobuild” table to include information of what queue the files are from (which implies a name change is appropriate), extend the configuration file to allow you to specify which queues should get autobuild handling for which suites, and finally add the code to actually do this when packages are moved into and out of the appropriate queues. Hurray for simplicity!
But there’s one little bit of complexity still to bother us. We know we need different directories and Packages files for different suites, but do we need it for different queues as well? Worse, we need different chroots on each autobuilder for each suite, do we need different chroots for each suite for each queue? The latter could well be a complete showstopper.
One reason why we might need to treat the queues we’re considering separately is that they have different secrecy requirements: embaroged updates can’t be visible to anyone, even people working on the unembargoed security updates in parallel, and yet the buildd has to forward its logs on to someone to be signed. If embargoed and unembargoed packages are built in the same chroot, by the same buildd, from the same queue, how do the logs magically go to the right person?
That’s not necessarily easily answered; currently the hope is that it’ll be okay to have a single shared queue and set of buildds and chroots, and have all the logs forwarded to a single address, at which point they can be redirected to the appropriate security team members by procmail. The assumption this is based on is that there’ll already need to be some magic to do that for the builds themselves once they’re uploaded and need to be moved into either the embargoed or unembargoed queue, so the same thing should be possible with the build logs.
Pending further surprises though, that’s enough to move onto the next phases, which is generalising dak to cope with arbitrary queues.
So having found time to catch up satisfactorily on the implementation, time to get back into the blogging.
After working out what you actually want to do, the next step in implementing stuff, in my book, is to make sure you fully understand the context of the stuff you’re trying to change. In this case, the “unapproved” queue concept comes from some thoughts from quite a while ago that come under the heading “NIv2” for “new incoming v2”. For reference, the original “new incoming” was the switch to separate queues for new, byhand, and accepted packages, which was followed by a notable update shortly after to make accepted autobuilding possible.
That setup works, but has a few problems. One is that packages sit in accepted for quite a while (up to a full 24h), and occassionally in that time it can become inappropriate to actually add the package to the archive – for instance if it’s a build of a package that’s then removed from the archive. This results in an UNACCEPT, and requires manual intervention, since potentially it can result in weird inconsistencies, such as bug closures. Another is that files in the various queues aren’t tracked in the database, which can result in files (particularly upstream tarballs) being lost or left behind, and also requires some unpleasant special casing in the code. A final concern is that one of the bigger time sinks in the daily pulse is the apt-ftparchive scan, which is actually just doing work that’s already been done in order to allow packages in accepted to be autobuilt.
Put all those concerns together with the ones mentioned last week, and you get the impetus for NIv2. The principle of that is to put stuff in the queue into the database properly, simplify it, extend the queuing support to deal with different queues, and move stuff from accepted directly into the pool every quarter hour, though only pushing to buildds that frequently, leaving updates of Packages files and mirror pushes to still be daily, or every few hours or whatever.
This is what that bit looked like in the dak TODO:
[hard, long term] unchecked -> accepted should go into the db, not a suite, but similar. this would allow katie to get even faster, make madison more useful, decomplexify specialacceptedautobuild and generally be more sane. may even be helpful to have e.g. new in the DB, so that we avoid corner cases like the .orig.tar.gz disappearing ‘cos the package has been entirely removed but was still on stayofexecution when it entered new.
Obviously, it needed to get defined a little more clearly than that to work out how much the changes for the security queues I’m interested in depend on the other changes, and what they actually mean in detail. After some IRC discussion we came up with more or less the following summary of what NIv2 actually means:
How NIv2 should work.
q/accepted goes away; once packages are accepted, they get added straight into the pool and the database. At this point they’re added to a micro-suite for the buildds, and so apt-ftparchive can cache the package contents; and a symlink tree is made up so incoming.d.o works. The micro-suite might or mightn’t be a fully fledged suite in the database. Presumably it wouldn’t be visible in dists/ though. The micro-suite probably should contain packages uploaded to unstable for X hours, where X is 48 or 2x pulse-interval or similar.
This means UNACCEPT is impossible and apt-ftparchive takes less time in cron.daily.
q/byhand and q/new get new friends: q/sec-embargoed, q/sec-unembargoed, q/stable-updates. Rather than checking for “byhand” entries, or a lack of overrides; uploads get diverted to those queues if they’re uploaded to security or stable, and get moved out of there when a security team member or stable RM (ie, Joey) runs an amber/lisa-esque command.
This means sec-embargoed and sec-unembargoed need to be autobuilt.
This also means that proposed-updates can be a full suite, and point releases can be a matter of (a) adding proposed-updates to stable; (b) removing old versions from stable; (c) removing packages that have to be dropped; and users can put “stable + proposed-updates” in their sources.list reasonably sanely.
This also means that REJECTs from accepted (security) or the archive (p-u) aren’t needed.
It means the queue ends up looking like:
unchecked [uploads go here, insecure] holding [secured] new [waiting NEW processing] byhand [waiting byhand processing] stable-updates [waiting stable RM approval] done [.changes files for accepted packages] reject [.changes, .reason and rejected stuff] bts_version_track [data for BTS]with q/accepted disappearing and q/proposed-updates and q/old-proposed-updates being replaced by the stable-updates / oldstable-updates queues.
One of the nice parts about this is that it breaks down fairly neatly into three sets of changes: supporting additional queues, removing the accepted queue, and tracking queue stuff in the database. It does make those changes more or less dependent though – the new queues need to be autobuilt in the way accepted is at present; so it makes sense to add the queues before removing accepted so as not to lose the queue autobuild support.
The above actually can be simplified a little further – instead of managing a microsuite for changes since the last update, it’s possible to add accepted packages directly to the real suite; and generate the Packages files needed for autobuilding by querying packages accepted to unstable in the last 24 or 48 hours, simply by adding a field to track upload time for binaries as well as sources.
In any event, this thus makes generalising the queue autobuild support the next challenge.
Thu, 17 Nov 2005
Afraid of your Neighbour's Disapproval
Continuing on from the SCC stuff, which I should probably get in the habit of calling the mirror split, then…
At the same time as I’ve been trying to work out ways to fit the mirror split stuff into the AJ market concept, I’ve been pondering on and off what can be done to improve the security situation. My thinking’s probably still biassed by my long time wish for official security updates to testing, but I’m somewhat inclined to think that if we can figure out a way of integrating the testing security team and the official security team we might end up closer to a solution – at the very least that’d provide us with a decent number of people who can actively help out with security updates that aren’t secet and get a few more people familiar with the technical aspects of how security updates are released, who’ll then have less of a learning curve if they end up being added to the security team proper. And heck, even if that doesn’t work, it at least gives people running testing an easier url to remember for security updates.
Of course, integrating the teams isn’t that easy, or it would’ve already been done. So after thinking about it, I had a chat with Joey (aka Martin Schulze, aka the guy who’s issued 317 of the 340 security advisories in the past 12 months) about the problems:
<aj> so, the other thing i wonder is what you think of the testing-security-support split?
<Joey> need more input
<aj> as in separate host, funny domain name, not available on security.debian.org/, extra effort to mirror, etc
<aj> i think it sucks :)
<Joey> there are a lot of sucking things…
<aj> it’s true
<aj> so, i figure the reasons for the split are (a) you don’t want to add a bunch of people to team@ and hence vendor-sec, and (b) there’s no way to make it so they can use dak-security without being full team/vendor-sec members; and maybe (c) you don’t want them officially associated with debian security coz they’re newbies
<Joey> a) and b) are true
<Joey> However, with security.debian.org being a push mirror, it would be possible to let it be pushed from a second source into a different directory, such as /testing-security/
<aj> hrm
<Joey> However, that would require all architectures to be supported and not just some.
<Joey> all as in “all that are in testing”
<aj> ah, that’s (c) then – they don’t do enough for the official debian stamp
<Joey> ah ok
Assuming those really are the key factors (which might or might not turn out true, but they seem likely) then that’s something that can be worked on. The first point is completely out of my domain – I’ve never been on vendor-sec, and it’s not likely something that ought to be tweaked much anyway; vendor-sec’s a very closed list by design, and for most security work it’s probably a distraction (testing security as is currently operates without concerning itself with vendor-sec, AIUI); the second points purely technical; and the third point is a combination of technical issues and an issue of training, both of which will be hopefully helped a lot by integrating testing-security and security.debian.org.
So the easiest point of attack is obviously the second one, since it’s “just” a matter of expanding dak’s featureset.
The problem here is embargoed fixes – the security team needs to be able to prepare fixes for vulnerabilities revealed on vendor-sec, but then keep them secret from everyone not on vendor-sec until the publication date. Which is fine, except that dak takes an all or nothing approach to that sort of secrecy: secuirty uploads go into accepted, get built, and then the security team run “amber” to publish the update. But what that means is the accepted directory and its contents can only be visible to people with vendor-sec access, which means only people with vendor-sec access can do security updates.
But we do actually have a different queuing mechanism that allows us to keep some things secret, while not hiding everything: thanks to the crypto-in-main transition, the NEW queue acts like that, more or less, only allowing ftpmaster to see the contents of NEW packages until they’ve passed license inspections and the appropriate export notifications have taken place. And actually, we’d already considered extending that sort of behaviour to add an “unapproved” queue that would prevent uploads from entering proposed-updates without the stable release manager’s prior approval.
The idea then, would be to have a couple of queues for the security team: an embargoed queue, accessible only to those with vendor-sec access that would store embargoed fixed ‘til they’re ready to be released, and an unembargoed queue for preparation accessible by a wider group, for preparation of fixes that don’t need to be secret. Both queues would need to be autobuilt (and the build logs for embargoed uploads would need to stay restricted, while the build logs for unembargoed uploads would not), which is an added difficulty, since only accepted can be autobuilt presently. But otherwise, it seems basically plausible, both to myself and Joey.
Conveniently, Andrew Pollock’s been interested in throwing some money through the AJ Market to see what happens, and since the initial idea we were tossing around fell through due to his move to the US, he’s put up the dosh to make the above happen. Well, strictly the dosh to make me make the above happen, or something.
Anyway, I’m already running behind both in implementation and blogging about it, so back to the grindstone…
(Okay, perhaps it’s not quite the feature Mark Twain would most like added to dak, but it’s still nifty.)
Wed, 16 Nov 2005
Who can resist a good rhyme? Or a bad one?
So this round of dak hacking turned out to make the AJ Market scheme another notch more confusing – hence the delay in blogging, and the teaser in my last post. The issue leading to the confusion is that the major item on the list to start hacking on was SCC, which unlike the projects I’ve undertaken so far, is more than a day’s hacking. In fact, due to the need to give mirrors a chance to adapt to the new system between it being implemented and actually used, it’s actually more of a multi-week task. And doing it as a one-day-a-week project would extend that into a multi-month task afaics. I guess that’s still better than never, but obviously it’s worth looking into alternatives.
Naturally, then, the first phase of a longer project like this is threefold. We’ll call it “the three P’s”.
Planning
My theory at this point was to come up with a plan for what to do, try figuring out how much work it’d take, and then see what sort of financial arrangements might be plausible – not involving me cutting a few weeks out of my life for spare change, but without making the whole thing an unleapable chasm from what the AJ Market’s currently managing either. I figured writing it up as a semi-formal proposal makes most sense:
Summary
Implementation of the outstanding mirror split proposal for the Debian archive to allow new architectures, particularly amd64, to be included in the archive.
Benefit
In spite (or perhaps because) of its simplicity, this project has been languishing for over two years, and is not currently being worked on; so at present it’s not even possible to estimate when it would otherwise be completed.
It is most notably preventing amd64 from being integrated into the normal Debian development environment, causing derived distributions to maintain amd64 specific patches themselves.
In the longer term, reducing the constraints imposed on the archive size may allow the introduction of additional suites, such as backports or volatile, as well as additional architectures; though significant further discussion on this would be needed.
Background
Since at least mid-2003 the Debian archive has been closed to new architectures due to the already large amount of space and bandwidth required to become a Debian mirror. At present, the archive uses some 158GiB of disk, and about 1GiB per day; additional architectures are expected to require approximately an additional 10GiB each, and there are likely around half a dozen architectures that will be considered for addition once the moratorium on new architectures is rescinded (incl amd64, armeb, sh variants, kfreebsd and possibly partial architectures for arch variants such as s390x and ppc64).
The primary work needed to fix this involves:
- ensuring the mirror network operates correctly when a majority of mirrors are partial; this reduces the impact on bandwidth and storage capacity
- optimising portions of the archive maintenance software, particularly apt-ftparchive; this reduces the load on the archive server
- providing appropriate guidelines on the qualification criteria new architectures need to meet in order to be added to the archive; this provides a limit on future increases, allowing growth to be appropriately controlled
Actual work
I expect there will be six phases to the project:
- cleanup of the archive as it stands, and establishing a clear categorisation of its contents to define what a partial mirror by architecture or suite should officially contain
- providing appropriate scripts to ensure mirror sites can easily comply with the previously defined expectations for partial mirroring
- devise an appropriate structure for the new mirror network, that can easily incorporate existing mirrors, and coexist with the existing structure
- provide information on the new structure to both mirror admins and users; assist with the transition, and resolve any problems found
- ensure the archive management software is appropriately optimised, and that archive inclusion criteria have been debated and established
- add new ports that have passed the qualification requirements to the archive
In theory, a couple of days for each of those sound plausible, so making that twelve days actual work (with a couple of week’s delay in between for mirrors to have some time to adapt to the new network). On the downside, twelve days at a day a week is over three months of real time, not counting the possibility of doing other things with the one day a week, or Christmas, or the aforementioned delay for mirrors. Yick.
So much for planning.
Preparation
So the next “p” is preparation. In this case that’s finally getting around to fix dak CVS, which has been slightly broken since May. The extent of the actual breakage was just the loss of the ChangeLog history, aiui (or at least, that was the unrecovered breakage), but the result of that was months of uncommitted changes on both ftp-master and security (and reportedly from Ubuntu’s dak installation too). The changelog for the first set of commits (not counting buildd changes from ftp-master, security changes or Ubuntu changes that haven’t made it to ftp-master) looks like:
* tiffani: new script to do patches to Packages, Sources and Contents
files for quicker downloads.
* ziyi: update to authenticate tiffani generated files
* dak: new script to provide a single binary with less arbitrary names
for access to dak functionality.
* cindy: script implemented
* saffron: cope with suites that don't have a Priority specified
* heidi: use get_suite_id()
* denise: don't hardcode stable and unstable, or limit udebs to unstable
* denise: remove override munging for testing (now done by cindy)
* helena: expanded help, added new, sort and age options, and fancy headers
* jennifer: require description, add a reject for missing dsc file
* jennifer: change lock file
* kelly: propogation support
* lisa: honour accepted lock, use mtime not ctime, add override type_id
* madison: don't say "dep-retry"
* melanie: bug fix in output (missing %)
* natalie: cope with maintainer_override == None; add type_id for overrides
* nina: use mtime, not ctime
* katie.py: propogation bug fixes
* logging.py: add debugging support, use | as the logfile separator
* katie.conf: updated signing key (4F368D5D)
* katie.conf: changed lockfile to dinstall.lock
* katie.conf: added Lisa::AcceptedLockFile, Dir::Lock
* katie.conf: added tiffani, cindy support
* katie.conf: updated to match 3.0r6 release
* katie.conf: updated to match sarge's release
* apt.conf: update for sarge's release
* apt.conf.stable: update for sarge's release
* apt.conf: bump daily max Contents change to 25MB from 12MB
* cron.daily: add accepted lock and invoke cindy
* cron.daily: add daily.lock
* cron.daily: invoke tiffani
* cron.daily: rebuild accepted buildd stuff
* cron.daily: save rene-daily output on the web site
* cron.daily: disable billie
* cron.daily: add stats pr0n
* cron.hourly: invoke helena
* pseudo-packages.maintainers,.descriptions: miscellaneous updates
* vars: add lockdir, add etch to copyoverrides
* Makefile: add -Ipostgresql/server to CXXFLAGS
* docs/: added README.quotes
* docs/: added manpages for alicia, catherine, charisma, cindy, heidi,
julia, katie, kelly, lisa, madison, melanie, natalie, rhona.
* TODO: correct spelling of "conflicts"
Ugh. Still, that’s enough to start work.
And the final “P”? Come on, be honest with yourself, you know what it’s going to be.
Procrastination
Okay, that’s not entirely fair; the irrelevant bit of work was actually on to TODO list before SCC (mostly because it was something that I could get done reasonably quickly) and in fact was this line of the above changelog:
* dak: new script to provide a single binary with less arbitrary names
for access to dak functionality.
All the various model/actress names have been getting more than a little confusing recently, with almost forty in the dak suite, and another two dozen or so in use elsewhere – and then there’s the fact that the whole hot babes thing is both a bit offensive, and getting a bit old. OTOH, you need something to rename them to. We ended up deciding on the “version control solution” and introducing a “dak” command that’d launch all the different little bits of functionality depending on arguments, in the same way cvs, svn, tla, bzr, darcs etc do.
The implementation’s kinda neat: we have a list of commands (like “ls”) and their description (“Show which suites packages are in”), along with the python module and function they’re in (“madison”, “main()”). That let’s us not actually have to change any of the other scripts immediately, and lets “dak ls foo” work the same as “madison foo”. It also means that down the track we don’t need to have separate modules for each subcommand, and that we can rename modules and functions without affecting the user interface.
Of course it also means that all the internal scripts haven’t changed to use the new names yet, leaving the new interface a bit underused, but hey, “dak ls” at least manages to be one character shorter than “madison”, so that’s a win!
To be continued…
Fri, 11 Nov 2005
With the recent Kansas Board of Education decision and the results in the Dover, Pennsylvania Board of Education elections, the Intelligent Design debate seems to be all the rage. It’s not really that interesting a debate, mostly being a rerun of the standard evolution versus creationism stuff with some new catchphrases. Even as a religious debate it’s not really terribly interesting from what I can see; the idea that God created the universe, then allowed it to evolve to where it is today seems entirely satisfactory, and doesn’t even undercut the Bible as much as considerations like “why does God let evil exist?”
But to my mind, the real fun in religious debates is in accepting all the premises, and seeing where that really leads.
So let’s forget the fossil record, genetic science, knowledge about breeding animals, and whatever else that might be relevant, and think instead about Platonic ideals of God. If God is all that’s good, all that’s wise, all that’s loving and caring, and Man is made in His image, what does that mean for the question of whether we evolved from bacteria and apes or appeared fully formed, ejected from Eden? If you or I were perfectly loving, perfectly wise, knew everything and could do whatever we wanted, and what we wanted was to make beings in our own likeness, what would we do?
Sun, 06 Nov 2005
One of the things about the whole money thing and free software is the question of whether it’ll take all the fun and spontaneity out of hax0ring. As it turns out, that doesn’t even work when you try; so instead of doing dak work last weekend, like I’d planned and like the market was indicating, I ended up hacking on britney until about 5am Sunday morning, leaving the dak stuff to be spread over the week, instead. (After the last post, dak might seem cursed, but trust me it’s not. More on that RSN, but not just yet)
For those playing along at home, britney is the name of the scripts used to update Debian’s testing distribution; basically it automates some of the hard problems of Debian release management. Naturally, this is a pretty tricky job – there’re over 200,000 packages in Debian unstable (spread across eleven architectures), with fairly complicated sets of interdependencies amongst them. It gets complicated enough, in fact, that the core analysis britney undertakes isn’t just too processor intensive to do in perl or python, it’s too memory intensive to use standard malloc for its data structures.
(Cynical and Coding, and the UQ CS228 class of ‘98 will probably have some idea of the fun to come at this point…)
The problem is that britney uses an absurd number of linked lists to resolve dependency issues, with each entry in the list being an independent block of a dozen or so bytes; and adding the usual malloc overhead to this can effectively double the memory usage. Worse, britney also likes to reuse memory quite a lot, does a fair bit of backtracking, and generally acts like the prima donna she is. Up until last week, britney’s memory was managed by the use of a parallel bitfield indicating whether memory was free or not: so britney would first allocate a 128kB block using standard malloc, note it’s all free, then every request for 12 or 52 bytes would result in some bits being twiddled; and every free would twiddle the bits back, so they could be reused later. This means two things: one is that the free invocation has to include the size of the memory to be freed as well as the address, and the other is that finding an appropriate sized block of freed memory can be awkward, leading to some degree of fragmentation.
This fragmentation seemed to lead to britney having a gradual memory leak, which would mean that if the release managers were doing some complicated transitions gigabytes of memory would end up being used and OOMing; or, since britney runs on ftp-master and it’s probably not a good idea to have the OOM killer start on the machine that ensures the Debian archive’s integrity, the ulimit is reached, and britney aborts unhappily. Unfortunately, though, the use of a “roll your own” malloc means you can’t use the standard memory checkers to see what’s going on, so you’re left with guessing and hoping. And as a consequence the memory leak’s been around for ages, causing fairly regular minor annoyance.
This changed last week when Micha Nelissen (neli) got into the act:
<neli> why does it need so much memory ?
<aba> neli: because it needs to check how the changes affect the installability count
<neli> aba: ok…but still does not seem like a big deal
<neli> how large trees are we talking then?
<aj> it has to do a lot of backtracking
<aj> how large is the longest dependency tree in debian?
<neli> backtracking only takes time, not space, right?
<aj> you need a list of things to backtrack over
<aba> neli: it also consumes memory, at least with the current dpkg-implem ntation
<aba> (well, there are ideas how to avoid this memory fragmentation, but that’s something else)
<aj> (it’s not actually memory usage, it’s memory fragmentation)
<neli> hmm, bad memory manager ?
<aj> imperfect, *shrug*
After poking around at getting britney to run locally, and doing some basic analysis on how it was used, neli decided to rewrite the memory manager to use chunk-size pools – so that a separate pool would be used for 4 byte allocations, for 8 byte allocations, for 12 byte allocations, etc. That’s possible due to two factors: one is that the special memory allocation is only needed for data structures which are all made from fairly small sizes, and are a multiple of the word size. It also means that fragmentation can be avoided completely – since you can use the “free” elements to store a free list taking up no extra space. Happily, that change also made the memory allocation much simpler.
(It also shows the benefits of a fresh look at code; I’d been thinking of doing something similar for ages, but always been stopped because I couldn’t figure how to make it cope with the various different sized blocks I’d need. As it turned out though, I’d solved that problem ages ago when I wrote a separate allocator to deal with strings, which would’ve required my malloc to cope with a much larger variety of sizes)
What it didn’t do was actually solve the memory leak. Which was pretty confusing, since the old memory checker had some tests to ensure that memory wasn’t really being leaked, and if the memory’s not being lost due to a missing free, or being lost to fragmentation, where is it going?
At this point, the real benefit of having a simpler malloc implementation came across; because suddenly it was plausible to add some debugging code to allocate a separate pool for every memory allocation, and to report how much memory is being used at any point, per line of code. That gave me output like:
pool 8B/1:751; 1962683 used 1965960 allocated (99.8% of 22 MiB)
So I knew that line 751 (of dpkg.c), was allocating 8B chunks, had used up to 22MiB in the past, and was currently using 99.8% of that. (Actually, I made an off by one error initially, so the “8B” above is actually 12B, as you can see if you divide 22MiB by 1965960)
That let me see two “leaks”. One turned out to be fairly easy to diagnose; as an optimisation, when I say “package foo is installable”, I note that “but we used bar to see that, so if we later update/remove bar, note that we’ll have to see if foo is still installable”. So the datastructure for bar ends up with a linked list of all the packages, foo, that relied on it. But I don’t check that list for dupes, so if I note that both bar and baz were required for foo to be installable, then update bar, when I recheck foo, I’ll add it to both bar’s and baz’s lists; which is fine for bar, whose list I cleared when I updated it; but not so fine for baz, because it’s now got two entries for foo. Having already added a check for dupes when trying to see if my guess was right, extending that check to skip the addition was pretty easy.
The other leak required a little more investigation. It was in a fairly tight loop which could reasonably be using a fair amount of memory; but by the end of the loop it should all be freed. Adding some more debug code to the allocation functions confirmed the problem:
<aj> pool 8B/11:1231; 1618157 used 1703936 allocated (95.0% of 13 MiB, 9.05% current)
<aj> so it looks like ~9% of those allocations are never freed to me
<aj> 9.05% = used/alloc
<aj> where used is incrememnted on b_m, decremented on b_f
<aj> alloc is incrememneted on b_m, never decremented
The code was meant to either add the allocated block to a larger list that’s being traversed, or to free it immediately – the problem turned out to be that there were some nested ifs, and an internal if was lacking an else that should have freed the block. Fixing that changed the results to:
pool 8B/11:1228; 160 used 131072 allocated (0.1% of 1 MiB, 0.01% current)
Which both dropped the leak – only allocating a maximum of 130,000 nodes instead of almost 2,000,000 – and resulted in almost all the allocations that had ever been made, having been freed once britney had been running for even a fairly short while.
As it turned out, though, the cure for the first leak was worse than the disease – checking for duplicates in a linked list turned out to take too long, and the memory wasted in that section wasn’t that big a deal. Using a red-black tree or similar instead would’ve been a solution, but so far doesn’t seem necessary. There haven’t been any OOMs since, happily; and even better, the reduced bloat seems to have made the overall checking a bit faster. Sweet.
