Friday, May 25, 2012

Red-Black Trees Redux: Left-Right Symmetry

I previously compared top-down versus bottom-up red-black trees and concluded that bottom-up is faster and therefore preferred.

Recently I was working with binary search trees (BSTs) and happened to observe anomalous performance between two implementations of a BST (splay tree). One implementation was taking much longer than the other. After some investigation I attributed the variance to two factors: comparison functors and use of left-right BST symmetry. Cursory examination indicated that using a functor instead of direct comparison of keys was increasing execution time around 10-20% for a particular workload. Left-right symmetry was causing about 30-40% increases in execution time!

Comparison functors are important because an implementation of a BST cannot know ahead of time how to compare generic nodes, and—for languages that lack decent templates such as C—a functor is a fact of life. But left-right symmetry is a choice. Some arguments for left-right symmetry are that it reduces the source lines of code (SLOC) and increases correctness. I think the performance problems of using left-right symmetry comes from indexing in an array on top of a pointer dereference, and doing so multiple times. By separating the left and right cases explicitly the programmer specifies the pointer precisely.

Execution time increases for code that exploits left-right symmetry at least for splay trees. How much does the use of left-right symmetry affect red-black tree performance?

Using a search benchmark that I coded from a published description, I evaluated two versions of red-black trees: one that uses left-right symmetry to compress code paths and one that explicitly represents the two sides. (The left-right symmetrical code is the version currently in use in RTEMS.) The benchmark inserts search keys generated randomly until the tree reaches a specific size, and then issues a mix of update and search operations in sets of 100. For this work I used sizes between 16 and 4096 in powers of 2, and about 1000 update/search operations, with the mix varying between pure search and pure updates. Otherwise the benchmark is as described in the paper linked above.

The following charts show representative results from running these benchmarks in Simics/GEMS. The "rbtree" results use explicit left and right child pointers and split the cases; "rbtree sym" compresses the symmetrical cases together and uses an array of two child pointers.

Cycles to insert elements into a red-black tree until reaching the max size.

Cycles when searching in a red-black tree at certain sizes. 1000 searches at each point.

Cycles when updating and searching. 500 searches, 500 extractions, and 500 insertions at each point. Note that an extraction uses a search to find the node to remove.

The percent increase in execution time when using left-right symmetry of insertion is less than that of search: not good for a structure whose purpose is efficient searching!

Ben Pfaff's libavl uses left-right symmetry (and a table-based approach instead of parent-pointers), and comparing the performance of libavl using explicit cases for handling left and right might be interesting. Using his systems benchmarks to evaluate the RTEMS red-black tree implementation would be nice, although of limited value since the benchmarks target general-purpose (*nix) systems. The Linux (lib/rbtree.c) and FreeBSD (sys/sys/tree.h) kernels both have red-black tree implementations with explicit left and right pointers.

I have not evaluated whether the symmetrical case reduces code size (generated machine instructions in the binary). How much is SLOC reduced, and does that translate to reduced program size? If code size goes down due to symmetry then, for an embedded system, the choice of which implementation to use becomes more complicated. But if code size is the same (or increases!) then quantitative reasoning tells us not to use left-right symmetry.

Friday, May 18, 2012

Bike to work day

Today was bike to work day and I normally commute on Friday anyhow so I participated albeit without registering. There were noticeably more people out, particularly in small groups, although not many more than on a nice weekend day: fewer pedestrians though. I wore my new kit, which consists of the bib shorts and jersey of the 2008 Finnish champion (from team Francaise Des Jeux), blue/white helmet, blue gloves, white shoes, and my blue/white bike.

A skinny-tire caught my wheel shortly after I hooked up to the W&OD trail and we paced each other most of the way into DC. It was nice although for the first couple miles I didn't keep his wheel because he was a bit more brazen with rolling through stops. We had a pretty nice pace for the latter half of my ride. He was commuting from Ashburn, which is a sight farther than my own ride.

The ride home was a little rough, but the worst of it came near the end. Shortly after I got back onto the streets from the trail I was passed by a black BMW and the passenger yelled, "Get off the road %$#hole!" I have read that this is rather common, and so is having things thrown at you or being buzzed. The latter two constitute assault with a deadly weapon (vehicle) and can be brought to court.

Since I was only verbally harassed I decided to give a little back. So I chased the car down to the next stop light, pulled alongside, and yelled, "I have as much a right to the road as you!" The passenger, a young man, pretended not to hear because the window was up; the young girl driving seemed either embarrassed or amused, or maybe both. I went off to another lane to resume my ride. The same car passed me again a little later. Some people suck.

Thursday, April 19, 2012

Downloading emails to apply git patches from Gmail

Now that RTEMS is using git I frequently have to apply patches received by email, but GMail lacks a web interface for downloading selected emails. The conventional wisdom for how to apply a patch from an email is to "show original", save the page, manually remove the first blank line, and then use git-am to apply. Now do that ten times for a series of patches. How tedious!

So I wrote fetch-flagged-email—a small Python application—to accomplish my goal. Now I star patches in gmail and then run fetch-flagged-email.py followed by git-am. The script does the heavy lifting of saving emails to files in sequential order to a directory of my choosing.

productivity++;

Monday, April 16, 2012

Crufty Stuff

Joel Sherrill posted back in 2010 about the lines of code in RTEMS. I was curious about something related to this so first I replicated his work using sloccount. I think he may have neglected the inline files (because I had to add the .inl extension). I also added support for m4, .ac, and .am files. The latter two are part of the autotools build system and are filed under 'makefile'. Here are the current results on the RTEMS tree
Totals grouped by language (dominant language first):
ansic:       745153 (86.51%)
asm:          38574 (4.48%)
makefile:     34934 (4.06%)
ada:          26261 (3.05%)
sh:            7086 (0.82%)
cpp:           5248 (0.61%)
m4:            2996 (0.35%)
pascal:         814 (0.09%)
perl:           279 (0.03%)

Total SLOC  861,345
Now for the interesting part: I also ran sloccount after running bootstrap, which generates a bunch of files for the build system. I modified sloccount to also count the Makefile.in files (which are generated with autotools from Makefile.am sources). Here are the results
ansic:       745153 (35.73%)
sh:          638471 (30.61%)
makefile:    557257 (26.72%)
m4:           73475 (3.52%)
asm:          38574 (1.85%)
ada:          26261 (1.26%)
cpp:           5248 (0.25%)
pascal:         814 (0.04%)
perl:           279 (0.01%)

SLOC*     2,085,532
* includes generated files

And do you know how many humans look at that generated code? Any idea how much of it is tested?

It would be interesting to include all the other code that goes into making RTEMS work such as the compiler, RPM packing scripts, and so on. The moral of this data is that the build system is complex. Very complex.

This data was generated using David A. Wheeler's 'SLOCCount'.

Monday, December 19, 2011

Using partitions as a free list

Yesterday I discussed my search for some help with implementing a free list in RTEMS. Today I'm glad to say I was able to use partitions to achieve my objective of a growable free list—something akin to a slab allocator. A fundamental limit, the maximum number of partitions, still exists but partitions can have an arbitrary size; these limits might be circumvented safely by making each partition much larger than the previous or by combining old partitions into new ones. Further improvements can be made, but for now I'm satisfied.

Allocation
Iterate over available partitions until one returns a free object. If none succeeds, create a new partition (malloc a new set of objects) and get a free object from the new partition.

I have not handled what to do if partition creation fails. Performance could be improved by tracking the non-full partitions.

Deallocation
Return the object to its partition. I made this easy by storing a handle to the owning partition as part of each object. Non-invasive techniques, such as tracking address ranges for each partition, could be used for a generic solution.

Sunday, December 18, 2011

RTEMS free list?

I implemented a free list in an RTEMS application as a pre-allocated array of objects on a linked list, but I foresee the need to have a fail-over case for growing my free list dynamically. Extending my current implementation to support dynamically changing the size of the free list is easy enough, but I'm interested to see if such support exists. I found two likely candidates: Partitions and Regions.

Partitions implement a simple free list of objects and have low overhead in both time and space. However a partition cannot change the size of its memory so an allocation will fail after the partition exhausts the preallocated set of buffers. This behavior is exactly what I already have implemented for my application.

Regions implement a more complex memory manager that allows for carving up a chunk of memory, calling it a region, and allocating segments from within a given region. However since segments can be variable-sized, the region manager uses a first-fit allocator that coalesces free blocks. Such an allocator is overkill for a free list of fixed-size objects.

Neither are quite what I want; where is the middle ground. My suspicion is that no such mechanism exists already because (hard) real-time applications usually have well-known resource requirements; the number of objects in the system are known and the partition manager would be sufficient. But when an unknown number of same-sized objects are employed, the current mechanisms appear to be insufficient.

Interestingly the Object Manager, an internal RTEMS subsystem, does seem to have capabilities to support dynamically extending the size (up to a preconfigured but possibly unlimited maximum) of a free list of fixed-size objects. Unfortunately the Object Manager is not accessible directly from the application level without violating API visibility rules.

Update: See my follow-up post regarding using partitions to implement a growable free list.

Wednesday, December 7, 2011

Science and Government

I just finished reading a brief lecture by C.P. Snow entitled "Science and Government" that, among other things, examines the role of science in decision making.  Anyone who is involved in committees, politics (governmental, office, academic, or otherwise), or decision-making should read this lecture. I found its central story both entertaining and enlightening.

One of the main conclusions of the lecture is that decision-making ought not succumb to the euphoria of secrecy, and that decision-makers ought not be obsessed with gadgets: personally favored creations that have little value and lack a broad base of support. Snow says,
The euphoria of gadgets; the euphoria of secrecy...are the origin of 90 per cent of ill-judged scientific choices. Any scientist who is prone to these euphorias ought to be kept out of government decisions or choice-making, at almost any cost.
And as argument against obsessive covertness he claims that "societies at about the same level of technology will produce similar inventions."

He also discusses the roles of committees in decision-making---regardless of political organiziation committees are almost invariably present---and gives some advice for what may constitute an effective committee, namely
  1. The committee's objective must be clear and not too vast.
  2. The committee must be placed appropriately within the hierarchy of the larger (governmental) system. 
  3. The committee must have powers of action. Advisory committees in particular have no power and therefore cannot make decisions.
I found the read quick, accessible, and relevant despite being 50 years old.