In the latter half of last year, my research went down an OOP path especially in the direction of C++; both the hardware containers project and my thesis touch elements of OOP concepts. With the hardware containers we were interested in how C++ developers use inheritance, composition, and encapsulation. For my thesis work, I was interested in how applications use the STL; I wrote an appendix in my thesis describing this aspect. For this post, my focus is on the hardware containers work, which is in my paper pipeline waiting for submission.
I found 21 free and commercial (oxymoron?) open-source C++ programs for this investigation: dimacs-sq, Dijkstra's algorithm; Opal and GEM5, processor simulators; Geant4, physics particle simulator; FlightGear, flight simulator; Wesnoth, video game; OpenCV, computer vision library; Boost, library for C++; MySQL, database server; LibreOffice, office productivity suite; Doxygen, documentation generation; Haiku, OS based on BeOS; ReactOS, OS based on Windows NT; Chromium, web browser; povray, soplex, dealII, namd, xalancbmk, astar, and omnetpp, C++ benchmarks from SPEC CPU 2006.
Using synthetic microbenchmarks, Eugen and I determined that the important C++ code behavior attributes that affect performance when using hardware containers are the ratio of private access specifiers to total number of class variables, depth of inheritance, number of objects a class includes by value, and number of public accessor methods that expose private variables. Nicely enough, all of these can be determined by examining source code: no need to run the C++ programs!
I used two tools to analyze the 21 programs: Understand by SciTools, and CCCC. The former is commercial and comes with an evaluation license, which was enough for me to get the data I wanted; the latter is open-source, which was good enough for me to modify to get data I wanted. I used Understand! to measure the ratio of private:class variables and the depth of inheritance, but the tool did not provide enough data for the objects included by value or for accessor methods. Ultimately I did not get any data for the latter, but I was able to hack CCCC to measure how many objects a class includes by value.
For the composition of classes I used my modifications to CCCC to count how many classes each class includes by value. On average, classes do not include very many objects of other classes by value (perhaps by reference), although large outliers exist. The chart shows the average number in orange, with one standard deviation as an error bar; the maximum among all the classes is in blue.
I used both CCCC and Understand to calculate the inheritance tree depth of these C++ programs. Using both tools showed some difference in the tool quality, since for a metric as simple as inheritance depth the two tools should get identical numbers. The following chart shows the maximum depth of the inheritance tree reported by each of the two tools with CCCC in blue and Understand in orange.
Although the tools occasionally agreed, vastly different numbers were obtained for lots of the programs. The maximum inheritance depth is less than 10 for almost all of these programs, with small (near 1) averages and standard deviations. Thus most classes are decoupled with small inheritance and composition factors, but some strong outliers do exist. For hardware containers, we can treat the outliers as a special case and make safe (enough) assumptions about an object's composition.