A Computer Scientist's Quest

Effective Terminal Windows

2013-12-10T11:50:00.000-05:00

This one's for all the terminal junkies out there.

I have a tendency to open about 10-20 terminal windows at a time, and it gets hard to track which terminal is in use for which ongoing project. The defaults for my distro are to put a ridiculously long and useless title on the terminal, and the prompt shows the entire working directory. Titles get really long when the directory structure is nested more than 8 or so levels. I did a little searching and found the right set of configuration commands for bash to make my terminal titles and prompts more useful for me. Now my title shows the current working directory name, and the prompt shows at most 6 levels (the current directory and its 5 ancestors).

Add the following after the definition of PS1 in .bashrc to get my configuration:

# If this is an xterm set the title to the current directory name
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\W\a\]$PS1"
    ;;
*)
    ;;
esac

# cut off really deep prompts
PROMPT_DIRTRIM=6

Software product country of origin (COO)

2013-10-03T09:09:00.000-04:00

Late last year, US Customs (CBP) issued an advisory ruling regarding how to determine the COO for software products when software is developed partially in a country not listed in the Trade Agreements Act as a designated country. For example, China is not a designated country. A fair description has been written up here. The ruling provides a template for labeling the COO as a designated country despite using source code developed in a non-designated country.

The reason software companies would want to certify their product's COO as a designated country is so they could sell their software to the US Government. The main problem is that competitors (or whistleblowers) can sue under the False Claims Act. Significant damages can be awarded if the court finds the COO is not correct.

Now the companies that want to label their software COO as a designated country can get a better chance at either defending such claims or getting a binding ruling from CBP. These companies are scrambling to hire law firms to determine if the advisory ruling can help, to seek a binding ruling from CBP, and to otherwise gather evidence to use to back COO claims. I consulted for BlankRome LLP to help them with just such a task. In particular, I examined their client's software development processes to help determine whether and how they fit the template, and to see what evidence there is to claim that the software product is "Made in the USA" despite the fact that much of the software, in terms of source code, has been written in China. While the details of my engagement (who, what, how) are covered by an NDA, I can give some high-level intuition on the issue.

At first blush, it seems counter-intuitive that the bulk of software can be written in one country, while the end product claims to be another. The defensible stance, however, is to claim that the creativity and human knowledge required to make the software comes out of the design, requirements specifications, and testing/validation. The key seems to be that experts in the US should be involved in making decisions both about how the software is written and in selecting the code modules to use.

Congratulations RTEMS GSoC 2013 Students

2013-05-30T09:25:00.004-04:00

Congratulations to the accepted students in GSoC 2013 for the RTEMS Project! We had many fine applicants again this year, and were able to accept 9 students to do projects with us this summer. In alphabetical order by project name, we accepted these students/projects:

Shubham Somani

Application Configuration GUI for RTEMS.

Deng Hengyi

Atomic Operations and SMP lock debug tool for RTEMS

Dhananjay M Balan

Better debugging support for RTEMS in GDB.

Hesham Moustafa AL-matary

Enhance low-level API of libmm (Memory Protection & Caches)

Philipp E

Paravirtualization layer in RTEMS

Jin Yang

Porting CAN driver, LinCAN, to RTEMS

Peng Fan

RTEMS Runtime Loader

Sree Harsha Konduri

SMP Aware Scheduler

Vipul Nayyar

Unified APIs

These are all important, ambitious projects that will be of great benefit to RTEMS if successful, and will definitely teach the students a lot about project management and open source development.

Who's calling me? Visualizing function callers

2013-05-09T13:04:00.000-04:00

My problem today was to determine and visualize the set of functions (callers) that call another set of functions (callees). For this purpose, I knew the callees function names all started with the same word, say "Callee", and that none of the caller's names start with Callee. For C programs, namespaces are often formed by a coding convention that specifies the format of function names and groups related functions by a common "first name".

I found a simple tool, egypt, that relies on gcc and GraphViz to generate a visualization of a program's static call graph. A call graph is a natural way to visualize what I need, but my requirements are slightly different than the usual. Normally a call graph will include all of the directed edges from callers to callees in an entire program or subset of its functions. What I need is just the nodes of immediate callers of the Callee functions, that is, the subgraph induced by the caller and callee vertices in the call graph.

For my needs, the tool lacks the ability to specify a set of vertices and generate the subgraph they induce. The induced subgraph could be found if I could:

specify terminal nodes (callees)
filter nodes that do not have an edge to terminal nodes

Rather than implement this ability, I realized I could wrap egypt with a bit of shell code to produce the graph(s) I want.

egypt processes RTL dumps produced by gcc. It outputs a call graph for the RTL dump files passed in. The first thing to do then is to compile the project to visualize using gcc with the -fdump-rtl-expand flag. This produces the *.expand files required by egypt. My first attempt was using gcc 4.4.7, which dumps all of the .expand files in the root directory of the build. For projects with multiple source files having the same name, the .expand files overwrite each other. I switched to gcc 4.7.1 because it dumps the .expand files in the same directory as the .o files are generated. Then, I revisited some old tricks to gather the .expand files into a separate directory for analysis while keeping the project's directory structure.

Now, if I just wanted the callgraph of the Callees, I could easily grep for the .expand files containing functions that start with Callee and pass them to egypt. However, to get the subgraph of caller->Callee is trickier. What I did was to use egypt on each of the .expand files individually and filter the output for edges to a Callee. This gives exactly the set of nodes and edges I need. Then I just need to wrap the output similarly to how egypt does to produce a graph file for GraphViz. The resulting script looks something like,

#!/bin/bash
echo "digraph callgraph {"
files=`find . -name "*.expand"`
for f in $files
do
egypt --include-external $f | grep Callee_ \
| grep -v Callee_.*-\> | grep -v Callee_.*\"\;
done
echo "}"

The --include-external is necessary to force egypt to produce caller nodes for which it does not find a definition. The second line of greps are to exclude any calls originating from Callee, and to discard any uncalled Callee. Redirect the output of the script to a file, say callgraph.dot, and then it can be processed with one of the layout engines in the dot tools, like
$> dot -Teps -o callgraph-neato.eps callgraph.dot

As an example, I processed the RTEMS Supercore Scheduler package. For this package, the Callee is _Scheduler. I processed the output with the circo drawing filter.

Supercore Scheduler Callers, with Scheduling Functions filled in Grey

Some further improvements could be made. It might also be interesting to visualize the paths (open walks) that end at the Callees. Also, the egypt tool works only on the static call graph, so indirect function calls e.g. through function pointers are not captured. Making use of dynamic profiling tools that generate a call graph, such as gprof or callgrind, could improve the accuracy of the visualization.

Software Licenses with RTEMS

2013-05-07T16:55:00.000-04:00

The RTEMS license is a modified version of the GPL version 2 that includes an exception to permit including headers and linking against RTEMS object files statically. Normally, the GPL can only be linked statically with other GPL code, or rather, linking statically with GPL code would cause your code to become GPL code. The LGPL is not a suitable alternative, because it either requires use of a shared library that can be re-linked, or release of the linked (application) code. And newer versions (GPL version 3) are completely unsuitable for embedded systems due to the relinking restriction which is technically challenging.

A problem for RTEMS is there are no copyleft licenses that are compatible with the RTEMS license. Thus, RTEMS Project has to reject any code that uses the GPL or LGPL, even though RTEMS seems to use the GPL itself---this is because of the exception for static linking, and also because an upstream GPL version 2 project could at any time switch to GPL version 3 and become totally unusable. In practice, RTEMS can only accept original code contributed under the RTEMS License and code that has a permissive license.

I could not find any license that provides the copyleft protection of a software project while still allowing static linking of proprietary software. Maybe there is some subtle legal or technical issue that I do not understand, but it seems like such a license ought to exist somewhere that protects the free software while permitting applications to use it; a sort-of "Embedded GPL".

Some things that RTEMS could do better include:

Collect all of the copyright and license disclaimers for users
Collect all of the advertising restrictions, or move those encumbered files to a secondary repository
Switch from the GPL + linking exception, but to what I do not know

Update 5/8/13: Identified that RTEMS uses version 2 of the GPL, and give some background on why RTEMS has not and will never switch to version 3.

Generating interrupts with a gem5 device

2013-04-19T15:30:00.000-04:00

Today I extended my work of adding a device to gem5 by causing the device to generate an interrupt. Interrupts seem to be architecture-specific in gem5, so I needed to include some X86-specific functionality. I copied the approach taken in gem5 for the x86 keyboard interrupts, but with IRQ 3 (because it was available) for my device.

The following files should help recreate this effort:

device.diff - apply to gem5 to create simple device.
interrupt.diff - apply to gem5 with device.diff to add interrupt.
mydev.c - device driver Linux kernel module.

Adding a simple io device to gem5

2013-02-28T15:53:00.000-05:00

Last time, I added custom pseudo-instructions in gem5. Today, I add a device in gem5 and then use the device from within a simulated (linux-x86_64) system.

Adding a device to gem5 is lightly documented in the ASPLOS tutorial and gem5 wiki. I would suggest starting with the tutorial, and read about the memory system as well.

Devices are located in gem5/src/dev/ subtree, with architecture-specific files located in subdirectories. The IsaFake device, which I found before the ASPLOS tutorial, was useful for starting. To create a simple device, I copied isa_fake.[cc|hh] to mydev.[cc|hh], and copied BadDevice.py to MyDevice.py. Then I copied the parameters for IsaFake from Device.py into the parameters of MyDevice.py, and added mydev.cc and MyDevice.py to the SConscript. After renaming (search-replace IsaFake/BadDevice with MyDevice, isa_fake with mydev, etc), I needed to add the device to the system. I'm working with x86, so I attached it in the x86/Pc.py file, with:

from MyDevice import MyDevice
...
    my_device = MyDevice(pio_addr=x86IOAddress(0xe000), pio_size=8)
...
        self.fake_floppy.pio = bus.master
        self.my_device.pio = bus.master
        self.pciconfig.pio = bus.default
...

After compiling and running gem5 the device is listed in the m5out/config.ini file.

Accessing the device requires a device driver. To learn about writing drivers, read a good book. For this driver, a simple kernel module will do.

#include <linux kernel.h>
#include <linux module.h>
#include <linux errno.h>
#include <linux ioport.h>
#include <asm io.h>

#define BASE 0xe000
#define SIZE 0x08

int init_module(void)
{
  int t1;
  if ( ! request_region(BASE, SIZE, "mydev") ) {
    printk( KERN_INFO "unable to get io port at 0x%8X\n", BASE );
    return -ENODEV;
  }

  /* a little test */
  t1 = inl(BASE);
  printk( KERN_INFO "read %d\n", t1 );

  outl(0, BASE);
  t1 = inl(BASE);
  printk( KERN_INFO "read %d\n", t1 );
  return 0;
}

void cleanup_module(void)
{
  release_region(BASE, SIZE);
}

Compile the module against the Linux kernel, boot gem5, get the module into the simulated system (e.g. with m5 readfile), and insert the module. With the default parameters from the IsaFake device, the write is ignored and the device returns -1 whenever it is read.

I did not get I/O memory working, but for now I/O ports are fine for me.

Add a pseudo instruction to gem5

2013-02-12T14:48:00.001-05:00

An important aspect of many computer architecture projects is to modify an instruction set, often to extend the instructions with a new instruction that implements a proposed feature. I'm working on moving some of my research to the GEM5 open source simulator, but first I need to get an idea of the level of effort involved. My first move is to figure out how to add new instructions.

GEM5, being designed especially for computer architecture research, has a well-defined set of pseudo instructions that can be extended to serve my purposes. However, there are not really any instructions on how to extend these instructions. The few emails that I could find about pseudo instructions basically just said go look at what is implemented and extend it. So that is what I did. For posterity, I'll relay my findings here. Maybe they will be helpful to others, or to myself in the future.

The pseudo instructions are useful for implementing functional simulator features that can use a multiple-register instruction. The main drawback is that the pseudo instructions are not integrated tightly with the pipeline and are executed non-speculatively, so if the rate of your new instruction is quite high, the cost could be misleading if doing performance evaluations of the new feature. For my work, the pseudo instruction is fine; I have previously done a very similar implementation for functional simulation with Simics/GEMS.

Adding a new pseudo instruction (for X86)

I'm interested primarily in the X86 full-system simulation capabilities of GEM5 at the moment, so my effort is in that area. However, the pseudo instructions have implementations in the other architectures, and most of the following will translate directly to them.

Overwrite a reserved opcode in src/arch/x86/isa/decoder/two_byte_opcodes.isa near the other pseudo instructions (look for m5panic).
Add the instruction’s functional simulation implementation in src/sim/pseudo_inst.cc
Add the function prototype in src/sim/pseudo_inst.hh. The function prototype will define the available registers for parameters and return values based on the compiler’s calling conventions for the architecture.
Create an m5op for easily emitting the instruction in compiled code.

Add function number in util/m5/m5ops.h
Add function prototype in util/m5/m5op.h
Instantiate a TWO_BYTE_OP in m5op_x86.S

I have written a simple example that implements addition as a pseudo instruction. The patch may bit-rot, but the idea should be easy enough to follow.

To use the new pseudo instruction call the function declared in util/m5/m5op.h. Then (cross-)compile your source code with the m5 utilities like:

gcc -o foo foo.c -I ${GEM5}/util/m5 ${GEM5}/util/m5/m5op_x86.S

To get your code into the simulation, you can

add the binary to the disk image

sudo mount -o loop,offset=32256 /dist/m5/system/disks/linux-x86.img /mnt/tmp
cp foo /mnt/tmp/bin

or read it directly into the simulation

build/X86/gem5.debug configs/example/fs.py -r 1 --script=foo
m5term localhost 3456
m5 readfile > foo
chmod +x foo
./foo

Adding to the disk image requires restarting the simulation, whereas if you have a checkpoint loaded you can read the file in directly using m5 readfile.

Executing the pseudo instruction on real hardware

You can also use your new pseudo-instruction in real hardware by providing an illegal instruction handler (SIGILL handler) that emulates the functionality of the instruction. This may be useful for debugging purposes, since native hardware can run the emulation code much faster than the simulator will. I have written a simple example that shows how to handle the illegal instruction signal that gets caused when the pseudo instruction is executed. This sample example will execute in both GEM5 and natively (on a 64-bit X86).

I guess that covers it for now. Happy hacking!

RTEMS on gem5 SPARC

2013-02-11T15:09:00.001-05:00

I previously wrote about booting RTEMS on M5 (Now gem5) SPARC_FS. I am following up on this work in preparation for the release of RTEMS 4.11, which will hopefully be coming soon. I compiled RTEMS (with two small patches) and booted two samples on gem5 and on Simics Niagara.

On the simulator side, the instructions I gave previously continue to work for booting both OpenSolaris and RTEMS. I updated some of the links in the prior post to reflect the change in host to Oracle for the OpenSPARC tools. One issue I did not notice at first is that the port for the console to connect m5term to the gem5 simulator is port 3457.

I successfully built and booted hello and ticker with the niagara BSP. There are a few patches that need to be applied to RTEMS, but hopefully I can get those patches committed before releasing 4.11. Ideally, the RTEMS 4.11 will be able to compile and boot with gem5, giving the RTEMS community an open-source simulator for testing.

Interestingly, the ticker appears to count time. I'll have to take a look at how gem5 simulates the (s)tick register. Simics does not simulate the timer with any accuracy.

I think Qemu would be a nice target simulator for some future endeavors, but I do not have the time to investigate the feasibility of running RTEMS on the Qemu Sparc64.

screen

2013-02-07T11:35:00.000-05:00

I finally learned how to use screen, a terminal windowing program that can detach so its windows run as a background task. When you detach, screen returns you to your shell. Detaching allows me to start work on a server from my lab workstation in a screen session, detach screen, logout and turn off my workstation, go home, and resume the same screen session from home.

Start screen with

$> screen

Once inside screen, the windows act mostly like a terminal, but now you can issue commands to screen using Ctrl-a and a command key. The useful keys that I have been using are:

Create a window (Ctrl-a c)
Next (Ctrl-a n) or Previous (Ctrl-a p) window.
Detach (Ctrl-a d)

To start screen with a previously detached session, first find the session with:

$> screen -ls 

There are screens on:
    11999.pts-0.localhost    (Attached)
    17129.pts-3.localhost    (Detached)
$> screen -r 17129.pts-3.localhost

The -r flag resumes the detached screen session.

Detaching is useful especially for long-running programs or for conserving the state of many terminal windows.

Profiling C++ applications: class composition

2013-01-24T12:53:00.002-05:00

In the latter half of last year, my research went down an OOP path especially in the direction of C++; both the hardware containers project and my thesis touch elements of OOP concepts. With the hardware containers we were interested in how C++ developers use inheritance, composition, and encapsulation. For my thesis work, I was interested in how applications use the STL; I wrote an appendix in my thesis describing this aspect. For this post, my focus is on the hardware containers work, which is in my paper pipeline waiting for submission.

I found 21 free and commercial (oxymoron?) open-source C++ programs for this investigation: dimacs-sq, Dijkstra's algorithm; Opal and GEM5, processor simulators; Geant4, physics particle simulator; FlightGear, flight simulator; Wesnoth, video game; OpenCV, computer vision library; Boost, library for C++; MySQL, database server; LibreOffice, office productivity suite; Doxygen, documentation generation; Haiku, OS based on BeOS; ReactOS, OS based on Windows NT; Chromium, web browser; povray, soplex, dealII, namd, xalancbmk, astar, and omnetpp, C++ benchmarks from SPEC CPU 2006.

Using synthetic microbenchmarks, Eugen and I determined that the important C++ code behavior attributes that affect performance when using hardware containers are the ratio of private access specifiers to total number of class variables, depth of inheritance, number of objects a class includes by value, and number of public accessor methods that expose private variables. Nicely enough, all of these can be determined by examining source code: no need to run the C++ programs!

I used two tools to analyze the 21 programs: Understand by SciTools, and CCCC. The former is commercial and comes with an evaluation license, which was enough for me to get the data I wanted; the latter is open-source, which was good enough for me to modify to get data I wanted. I used Understand! to measure the ratio of private:class variables and the depth of inheritance, but the tool did not provide enough data for the objects included by value or for accessor methods. Ultimately I did not get any data for the latter, but I was able to hack CCCC to measure how many objects a class includes by value.

For the composition of classes I used my modifications to CCCC to count how many classes each class includes by value. On average, classes do not include very many objects of other classes by value (perhaps by reference), although large outliers exist. The chart shows the average number in orange, with one standard deviation as an error bar; the maximum among all the classes is in blue.

I used both CCCC and Understand to calculate the inheritance tree depth of these C++ programs. Using both tools showed some difference in the tool quality, since for a metric as simple as inheritance depth the two tools should get identical numbers. The following chart shows the maximum depth of the inheritance tree reported by each of the two tools with CCCC in blue and Understand in orange.

Although the tools occasionally agreed, vastly different numbers were obtained for lots of the programs. The maximum inheritance depth is less than 10 for almost all of these programs, with small (near 1) averages and standard deviations. Thus most classes are decoupled with small inheritance and composition factors, but some strong outliers do exist. For hardware containers, we can treat the outliers as a special case and make safe (enough) assumptions about an object's composition.

My new job

2013-01-22T17:31:00.000-05:00

As of last week I have started my new full-time job as a Postdoctoral Research Scientist at GWU. I'm working with my thesis co-adviser, and my job is an expanded version of what I was doing before graduating. In addition to continuing my research, I will also be overseeing/assisting the research of some graduate students and writing grant proposals.

The appointment is for the next 18 months or so, and I'm excited about carrying forward the momentum I have in my research.

Looking back to 2012, forward to 2013

2013-01-22T14:45:00.001-05:00

Of the many post ideas back-logged in my head, I wanted to recap the past year and perhaps look forward to the new year before making it out of January. For me, 2012 was a busy year in which I

Became a dad.
Defended my Ph.D. dissertation.
Dropped 20 pounds to 18% body fat, BMI 23.
Bicycled 2300 miles, ran another 160 or so.
Published 2 papers and a book chapter.
Continued RTEMS involvement as an admin and mentor for GSOC and GCI.
Went on a Caribbean cruise.
Bought my first car, a minivan.
Repaired/Improved my house: foundation crack sealed, water line repaired, water line replaced, water heater replaced, roof repairs, HVAC heat pump and blower replaced, ventilation ducts cleaned, and kitchen cabinets refaced.

For 2013 some of my goals, resolutions, and expectations are to

Turn 30. Done!
Get a job. Done, more on that soon.
Publish more.
Get more research funding.
Read more.
Learn a new (non-programming) language, probably ASL or suomi.
Continue RTEMS involvement.
Resume regular exercise. I cut way back after my daughter was born.
Finish a century ride, solo or supported. Current best: 88 miles.
Finish a half-marathon, solo or organized. Current best: 8 miles.
Take up swimming.
Go on a family vacation.
Build a "front porch."
Construct a new cat tower.
Get rid of stuff.
Be a better person.

Hood me!

2012-11-20T15:52:00.000-05:00

I successfully defended my dissertation thesis yesterday, thus overcoming the final hurdle to the Ph.D. other than some paperwork and formatting/publishing the thesis itself. The title of my thesis is "Operating System Support for Shared Hardware Data Structures" and it spans a range of computer science topics from low-level computer hardware—not quite circuits, but close—up to application software and everything in between. I'm pleased with both the scope of the work, the technical contributions, and the outlook for carrying this work forward. I chose to find and pursue a topic that interested me, and my advisors (and wife!) were flexible enough to permit me to do so. It feels great to take the seed of an idea through germination and see it begin to grow.

Critical Bugs and Quality Assurance

2012-10-16T11:07:00.001-04:00

Sebastian Huber recently posted a nasty RTEMS bug and fix. While simple, the bug manifested in their application as an increase in one task's latency from 20us to 170us! The cause of the problem was that RTEMS has two kinds of critical sections—dispatch and interrupt—and new SMP-aware code added a dispatch critical section to the thread dispatch code. The problem happens with overlapping a traditional interrupt disable critical section. The SMP dispatch code looks like:

void _Thread_Dispatch(void) {
Thread_Disable_dispatch();
SMP_Dispatch_other cores();
Interrupt_Disable();
...
... do things, including context switch if needed
...
Interrupt_Enable();
// problem here!
Thread_Unnest_dispatch(); // enable
...
}

The problem is that an interrupt can occur between Interrupt_Enable and Thread_Unnest_dispatch. Suppose a low-priority task (L) is executing _Thread_Dispatch, and the interrupt enables a high-priority task (H), but
H will not be dispatched because dispatching is disabled. Instead L enables dispatching and resumes executing, which is a priority inversion!

The fix reverts the changes made for SMP. For the SMP code, the priority inversion still exists and is unresolved. (RTEMS currently does not make real-time guarantees for the SMP support, so no one cares yet.)

In the broader picture, the bug seems like it should be easy to detect. The issue with free open-source software (FOSS) is that quality assurance (QA) is almost non-existent: the "many eyeballs" philosophy argues against QA. But what else can FOSS do? No one is going to pay for extensive testing, and if they do they have no incentive to share.

FOSS communities (and corporate developers) need better tools for software QA. This summer RTEMS had a GSOC student who was looking at testing. Testing is probably the first tool in the QA toolbox, and the only one most developers have a clue about; how about static analysis, path coverage, standards conformance, or certification? Some interesting work modeling, proving, and certifying systems is out there: Where is the undergraduate textbook and course on QA?

Web site update

2012-10-13T14:37:00.001-04:00

Last night I decided to provide an html version of my CV, and then I remodeled my website. The old version was ugly and broken; the main problem were my iframes. I simplified the design, tried to make it mobile-friendly, and reduced the content. I kept my basic design elements (boxes), and tried to eliminate cruft. I think the product is leaner and cleaner.

Version control for text/LaTeX?

2012-10-12T11:55:00.002-04:00

I often use version control (VC) for text documents—especially LaTeX. Modern VC software is good for code, and even text when working alone, but collaboratively editing text with VC is a nightmare. The biggest challenge for text VC seems to be structure/formatting. Code is well-structured within and between lines: text less so. Words/sentences can be re-arranged within sentences/paragraphs, and lines are meaningless. With VC, line wrapping propagates small changes and frustrates reviewing and merging.

Word processors track revisions, but such tools are restrictive and won't work for markup languages. Cloud tools for collaboration such as Docs exist, including some for LaTeX, but requiring connectivity while editing is a non-starter for me. I would even be happy with a custom LaTeX solution; the features I desire for text VC are similar to those for code:

Non-proprietary, application-agnostic, platform-independent FOSS
Revision history: see what has changed.
Revert: undo changes back to forever.
Lock-free: work in parallel, which requires...
Pain-free merge: help resolve conflicting commits.
Distributed and offline editing: no active (server) connections
External contributions: integrate changes made outside of VC

Most VC tracks changes either with changesets or snapshots. Changesets seem worthless because of the structural problems of text. Snapshots seem viable, but I haven't seen turn-key solutions for plain text or LaTeX, and handling external contributions seems especially troublesome.

GSOC2012: MMU project and musings

2012-10-11T17:51:00.002-04:00

I finally made a pass at merging my student's final code for his GSOC2012 project. The project was the culmination of multiple summer projects and some work I did. I'm excited to try improving it to add sparc64 support and use it in my research.

While revising the student's submission for reviewing and merging, I thought of a few ways to improve future new developer participation:

Better github integration. Staying up-to-date with rtems.git, tracking student progress, and getting code review from more developers would be helpful for quicker turnaround on submissions.
Style: documentation and code conventions. Clear, consistent guidelines and examples of proper/improper coding style would make reviewing and merging a lot easier.
Improved Git Workflow. Teaching students how to make useful branches and commits ahead of time would ease code merging, testing, and revising.
More submissions. We need to get code reviewed if not merged in smaller increments; this need is well-known and repeated.

These improvements could be addressed in part when students/developers are new to the scene. For example, instead of just making students prove they can build RTEMS and patch hello world, we could require them to document and fix the style of sample code using a branch, and submit a pull request for RTEMS github that contains their proof as separate commits on the same branch.

For getting students to submit and be reviewed more often will take more work on behalf of mentors, developers, and students. Something that may help would be requiring code review as part of the weekly status meetings we instituted this GSOC. Perhaps each student's weekly commits can be reviewed by their mentors as part of tracking progress and status.

Institutional support from RTEMS mentors and developers would help. For example, github integration requires developers and mentors to use github. Style consistency requires a style guide that we accordingly maintain and abide by. Teaching good workflow, and fixing the bad, takes effort: mentors need to (know how to) identify and correct a student who struggles. Increased submission frequency requires urging and commitment from mentors to review code. These improvements take effort, but I think they could substantially improve the participation, progress, and production of students.

My fall hiatus

2012-10-05T16:32:00.002-04:00

I've been busy lately, between child-rearing and working on a grant proposal, and I haven't had the time and energy for much else. I'll be writing my thesis soon, too, but that might encourage me to write more here. Meanwhile, check out some fake ads for programming jobs.

Making and freezing food in bulk

2012-08-07T13:42:00.001-04:00

My mom and one of my brothers visited my wife and me this past weekend in order to deliver the cradle—we're expecting a baby girl at the end of this month—made by my father-in-law. We decided to take advantage of this visit to prepare some frozen food to ease transitioning into parenthood. For about a day-and-a-half we cooked and froze one or more of the following meals to produce about 20 frozen ready-to-reheat meals:

Jambalaya
Mock Lasagna: layered cooked spaghetti with sauce and cream cheese
Penne with sausage and vegetables in spaghetti sauce
Chicken Cacciatore
Chicken Tetrazzini
Tuna Casserole
Chili
Chicken Curry with chickpeas, spinach, and rice
Rice, chicken and vegetables covered by cream of mushroom soup
Enchiladas
Meatloaf (uncooked)
Meatballs (uncooked)

We started by prepping almost everything: chopping vegetables; thawing, cutting, and seasoning proteins; buying and organizing canned goods; cooking rice and pasta. Prep work transferred most of the contents of my freezer to my refrigerator, which was full and working hard to keep food cool.

The general procedure we followed was to prepare two dishes at a time, and we tried to make double batches when possible. We would cook the meat for each dish, let the meat cool a little, line a freezer-safe container with plastic wrap, apply non-stick spray to the wrap, build the dish in the container, cover with plastic wrap, and stick in the freezer.

After a dish had thoroughly frozen (a day) we removed the dish from its container and wrapped it in aluminum foil outside the plastic wrap, and then wrote the name of the dish and the name of the container we froze it in. When we reheat/cook the dish we know what container to use for best-fit.

All-in-all the batch cooking was a success, and I expect the pre-made dishes will be appreciated by future me.

Red-Black Trees Redux: Left-Right Symmetry

2012-05-25T15:07:00.000-04:00

I previously compared top-down versus bottom-up red-black trees and concluded that bottom-up is faster and therefore preferred.

Recently I was working with binary search trees (BSTs) and happened to observe anomalous performance between two implementations of a BST (splay tree). One implementation was taking much longer than the other. After some investigation I attributed the variance to two factors: comparison functors and use of left-right BST symmetry. Cursory examination indicated that using a functor instead of direct comparison of keys was increasing execution time around 10-20% for a particular workload. Left-right symmetry was causing about 30-40% increases in execution time!

Comparison functors are important because an implementation of a BST cannot know ahead of time how to compare generic nodes, and—for languages that lack decent templates such as C—a functor is a fact of life. But left-right symmetry is a choice. Some arguments for left-right symmetry are that it reduces the source lines of code (SLOC) and increases correctness. I think the performance problems of using left-right symmetry comes from indexing in an array on top of a pointer dereference, and doing so multiple times. By separating the left and right cases explicitly the programmer specifies the pointer precisely.

Execution time increases for code that exploits left-right symmetry at least for splay trees. How much does the use of left-right symmetry affect red-black tree performance?

Using a search benchmark that I coded from a published description, I evaluated two versions of red-black trees: one that uses left-right symmetry to compress code paths and one that explicitly represents the two sides. (The left-right symmetrical code is the version currently in use in RTEMS.) The benchmark inserts search keys generated randomly until the tree reaches a specific size, and then issues a mix of update and search operations in sets of 100. For this work I used sizes between 16 and 4096 in powers of 2, and about 1000 update/search operations, with the mix varying between pure search and pure updates. Otherwise the benchmark is as described in the paper linked above.

The following charts show representative results from running these benchmarks in Simics/GEMS. The "rbtree" results use explicit left and right child pointers and split the cases; "rbtree sym" compresses the symmetrical cases together and uses an array of two child pointers.

Cycles to insert elements into a red-black tree until reaching the max size.

Cycles when searching in a red-black tree at certain sizes. 1000 searches at each point.

Cycles when updating and searching. 500 searches, 500 extractions, and 500 insertions at each point. Note that an extraction uses a search to find the node to remove.

The percent increase in execution time when using left-right symmetry of insertion is less than that of search: not good for a structure whose purpose is efficient searching!

Ben Pfaff's libavl uses left-right symmetry (and a table-based approach instead of parent-pointers), and comparing the performance of libavl using explicit cases for handling left and right might be interesting. Using his systems benchmarks to evaluate the RTEMS red-black tree implementation would be nice, although of limited value since the benchmarks target general-purpose (*nix) systems. The Linux (lib/rbtree.c) and FreeBSD (sys/sys/tree.h) kernels both have red-black tree implementations with explicit left and right pointers.

I have not evaluated whether the symmetrical case reduces code size (generated machine instructions in the binary). How much is SLOC reduced, and does that translate to reduced program size? If code size goes down due to symmetry then, for an embedded system, the choice of which implementation to use becomes more complicated. But if code size is the same (or increases!) then quantitative reasoning tells us not to use left-right symmetry.

Bike to work day

2012-05-18T16:53:00.000-04:00

Today was bike to work day and I normally commute on Friday anyhow so I participated albeit without registering. There were noticeably more people out, particularly in small groups, although not many more than on a nice weekend day: fewer pedestrians though. I wore my new kit, which consists of the bib shorts and jersey of the 2008 Finnish champion (from team Francaise Des Jeux), blue/white helmet, blue gloves, white shoes, and my blue/white bike.

A skinny-tire caught my wheel shortly after I hooked up to the W&OD trail and we paced each other most of the way into DC. It was nice although for the first couple miles I didn't keep his wheel because he was a bit more brazen with rolling through stops. We had a pretty nice pace for the latter half of my ride. He was commuting from Ashburn, which is a sight farther than my own ride.

The ride home was a little rough, but the worst of it came near the end. Shortly after I got back onto the streets from the trail I was passed by a black BMW and the passenger yelled, "Get off the road %$#hole!" I have read that this is rather common, and so is having things thrown at you or being buzzed. The latter two constitute assault with a deadly weapon (vehicle) and can be brought to court.

Since I was only verbally harassed I decided to give a little back. So I chased the car down to the next stop light, pulled alongside, and yelled, "I have as much a right to the road as you!" The passenger, a young man, pretended not to hear because the window was up; the young girl driving seemed either embarrassed or amused, or maybe both. I went off to another lane to resume my ride. The same car passed me again a little later. Some people suck.

Downloading starred emails from GMail to apply git patches

2012-04-19T08:31:00.002-04:00

Now that RTEMS is using git I frequently have to apply patches received by email, but GMail lacks a web interface for downloading selected emails. The conventional wisdom for how to apply a patch from an email is to "show original", save the page, manually remove the first blank line, and then use git-am to apply. Now do that ten times for a series of patches. How tedious!

So I wrote fetch-flagged-email—a small Python application—to accomplish my goal. Now I star patches in gmail and then run fetch-flagged-email.py followed by git-am. The script does the heavy lifting of saving emails to files in sequential order to a directory of my choosing.

productivity++;

Crufty Stuff

2012-04-16T21:12:00.000-04:00

Joel Sherrill posted back in 2010 about the lines of code in RTEMS. I was curious about something related to this so first I replicated his work using sloccount. I think he may have neglected the inline files (because I had to add the .inl extension). I also added support for m4, .ac, and .am files. The latter two are part of the autotools build system and are filed under 'makefile'. Here are the current results on the RTEMS tree

Totals grouped by language (dominant language first):
ansic:       745153 (86.51%)
asm:          38574 (4.48%)
makefile:     34934 (4.06%)
ada:          26261 (3.05%)
sh:            7086 (0.82%)
cpp:           5248 (0.61%)
m4:            2996 (0.35%)
pascal:         814 (0.09%)
perl:           279 (0.03%)

Total SLOC  861,345

Now for the interesting part: I also ran sloccount after running bootstrap, which generates a bunch of files for the build system. I modified sloccount to also count the Makefile.in files (which are generated with autotools from Makefile.am sources). Here are the results

ansic:       745153 (35.73%)
sh:          638471 (30.61%)
makefile:    557257 (26.72%)
m4:           73475 (3.52%)
asm:          38574 (1.85%)
ada:          26261 (1.26%)
cpp:           5248 (0.25%)
pascal:         814 (0.04%)
perl:           279 (0.01%)

SLOC*     2,085,532
* includes generated files

And do you know how many humans look at that generated code? Any idea how much of it is tested?

It would be interesting to include all the other code that goes into making RTEMS work such as the compiler, RPM packing scripts, and so on. The moral of this data is that the build system is complex. Very complex.

This data was generated using David A. Wheeler's 'SLOCCount'.

Using partitions as a free list

2011-12-19T16:34:00.000-05:00

Yesterday I discussed my search for some help with implementing a free list in RTEMS. Today I'm glad to say I was able to use partitions to achieve my objective of a growable free list—something akin to a slab allocator. A fundamental limit, the maximum number of partitions, still exists but partitions can have an arbitrary size; these limits might be circumvented safely by making each partition much larger than the previous or by combining old partitions into new ones. Further improvements can be made, but for now I'm satisfied.

Allocation
Iterate over available partitions until one returns a free object. If none succeeds, create a new partition (malloc a new set of objects) and get a free object from the new partition.

I have not handled what to do if partition creation fails. Performance could be improved by tracking the non-full partitions.

Deallocation
Return the object to its partition. I made this easy by storing a handle to the owning partition as part of each object. Non-invasive techniques, such as tracking address ranges for each partition, could be used for a generic solution.