Sunday, August 21, 2011

multilib

Back when Eugen and I were porting SPARC-V9 to RTEMS, we had to consider the compiler tools for cross-compiling RTEMS to the target architecture. One of the issues was that RTEMS uses the multilib feature of GCC for building architectural variants, so we should consider the multilib options for SPARC-V9 and submit patches to add support for the desired multilibs to the rtems-sparc64-gcc.

What are multilibs?
The GCC manual doesn't say anything other than to identify the flags that use multilib. Google isn't terribly helpful here either. The automake manual has this to say:

18.3 Support for Multilibs

Automake has support for an obscure feature called multilibs. A multilib is a library that is built for multiple different ABIs at a single time; each time the library is built with a different target flag combination. This is only useful when the library is intended to be cross-compiled, and it is almost exclusively used for compiler support libraries.

The multilib support is still experimental. Only use it if you are familiar with multilibs and can debug problems you might encounter.
Hardly helpful.

So we ignored multilibs and moved on. That was fine, although to this day the sparc64-rtems-gcc does not support multilibs. If a tighter integration of the sparc64-rtems-gcc and BSPs becomes important then this issue might be worth re-visiting. But I still had little knowledge about multilibs.

multilibs are...
It turns out that multilib is common, although most end users won't notice them. multilib is the process by which a library is built multiple times, each with a different set of compiler flags. That means a library can be linked to applications built with similar flags, which may be necessary in case the compiler flags change the ABI (application binary interface). An example is compiling legacy 32-bit applications on 64-bit machines: current Linux distros ship GCC with a 32-bit multilib, so the GCC libraries are available for compiling 32-bit applications. On a 64-bit computer without a 32-bit multilib gcc cannot compile 32-bit programs.

RTEMS adds multilib flags to its rtems-flavored gcc compilers to allow RTEMS to link to GCC libraries that best support the target hardware. For users of existing board support packages (BSPs), the multilib is invisible: the BSP specifies compiler flags (in make/custom/somebsp.cfg), and rtems-gcc links the right library. However, BSP developers should ensure that critical flags specified by the BSP match an available rtems-gcc mulitlib or else a new multilib should be contributed. The gcc flags will pull in a multilib if there is a match in gcc-print-multi-lib

On the RTEMS multilib page there is also an explanation of how RTEMS can build its own multilib library, the librtemscpu.a. Building a multilib RTEMS is mostly useful to save a little time and space when compiling many variants of the same processor family, for example multiple BSPs from the same CPU. Perhaps I'll follow up sometime to discuss RTEMS' librtemscpu.a

Tuesday, August 9, 2011

Red Black Trees: Bottom-Up or Top-Down?

Conventional red-black trees are bottom-up: nodes are inserted/deleted at the bottom and coloring properties are corrected up. Last year I found a tutorial describing a top-down approach claiming it is superior in terms of simplicity, ease of understanding, code size, and theoretical efficiency. Unfortunately these claims are unsubstantiated. Being curious I investigated further. I implemented an iterative bottom-up red-black tree (with parent pointers) and compared it with the top-down approach.

Since complexity increases with code size, I used sloccount to measure the code size. For the top-down approach, the Total Physical Source Lines of Code (SLOC) = 131. For bottom-up, SLOC = 189. So far the evidence does favor a top-down approach for being less complex.

But what about efficiency? (Speed is King!) The tutorial claims that a top-down approach makes a single pass of the tree and so is theoretically more efficient.

I wrote a simple testing program that generates and inserts (pseudo)random numbers as keys in a red-black tree and then removes them. The program takes wall time measurements with gettimeofday before and after both blocks of insertion and removal. Using wall time is crude but serves for this demonstration. By dividing the total time of insertion (removal) by the number of keys I compute the average cost per insertion (removal). I ran the testing program 10 times and averaged the per operation costs for top-down and bottom-up for an increasing tree size from between 100 and one million nodes. The following two images show the results (smaller is better):

The top-down approach has a worse average time for both inserting and removing nodes. I did not bother with statistical rigor, but the gaps were larger than a few standard deviations of the measured data.

Although my experiments were far from complete I opted to use the more established, more familiar, and (apparently) more efficient bottom-up approach in an implementation of red-black trees that is now part of the RTEMS kernel.

If I had to guess why the top-down approach performs worse I would say it is pessimistic in how many tree rotations it does. The bottom-up approach seems to make about as many tree rotations as there are nodes. (I believe the number of rotations is bounded to 2 for insert and 3 for remove, but on average it appears to be 1/2 rotation per insert/remove.) The top-down approach makes about 2-4 times as many (albeit simpler) rotations, and I have no idea if the bounds apply.