Monday, November 22, 2010

Interrupt Handling with MMU Hardware on an MMU-less OS

This post describes some work that I have just recently gotten to a point that it seems to be correct.  It was a long-running project that started around May 2010.

The way I provided sparc64 RTEMS with interrupt support is to register ISR handlers in the SPARC trap table. In sun4v/niagara, I use the open firmware (OpenSparc's OpenBoot) trap tables and can modify the entries directly during simulation. However, the sun4u/usiii firmware trap table appears to be "locked" -- when RTEMS attempts to write to the trap table, it gets an MMU data protection error. I do not think it is possible to change the protection bit, at least I haven't seen any way to do it, so the sun4u requires a different solution.  It would also be more appropriate to implement a general solution for the sun4v, since modifying the firmware trap tables is probably not a usable solution in many cases.

RTEMS is a real-time operating system that lacks support for Virtual memory, which is sensible for embedded devices.  However, the SPARC V9 (sparc64) architecture assumes a virtual memory system and includes the appropriate MMU hardware.  To further complicate things, the more recent variants of the sparc64 prevent system software from disabling the MMU easily, because the hardware is virtualized.  At any rate, dealing with the MMU can be troublesome, and is made even more complicated when the OS does not expect or support the MMU hardware.

From a high level view, I had two general thoughts for supporting sparc64 interrupts:
  1. Disable the MMUs
  2. Install a new trap table.
The problem with (1) disabling the MMUs is that the RTEMS application is not resident in physical memory. The hardware does demand paging to load the application data and instructions when they are used. Early attempts at turning off the instruction MMU caused the next instruction to be not found in physical memory and the CPU to crash.

The problem with (2) installing a new trap table is that when the processor switches to the other trap table, it is not present in the I-TLB. So the first trap to hit will cause an I-TLB miss, and the I-TLB miss handler, also being part of the new trap table, will also miss, which will lead the processor to crash. This is something of a chicken-and-egg problem.

After some failed attempts to resolve this problem, I decided to bite the bullet and integrate some MMU handling code with the sparc64 RTEMS.  Since I used some of the HelenOS architecture-specific code in the development of the original RTEMS niagara board specific package,  I went back to that code-base to see if I could get some useful MMU support code.

I managed to mangle the HelenOS bootloader and kernel startup routines so that only the trap table install and MMU take-over were used.  The next step was to get an appropriate trap table installed.  After some wasted effort trying to use the HelenOS trap table, I realized I could do something a little cheaper: copy the Open Firmware trap table, install a valid TLB mapping for it, and then switch trap tables.

Success!  Sort of.

The MMU trap handler copied from the firmware attempts a branch-always (PC relative) to a region of code that exists in the firmware but is not part of the trap table and therefore is not copied to the relative address of the branch target.  I patched around this problem by copying extra memory from the firmware; an experimentally derived value of doubling the copied space (from 32K to 64K) works on the Simics simulator. This solution is inelegant, hackish, and brittle, but it worked so I let it be for awhile.

I was recently able to revisit this work and implement a better fix. Now a
trap table is generated that will by default jump to an address+offset that
represents the firmware trap table. Basically, each entry in the new trap table points to the same entry in the firmware trap table.  So when a trap is caught, it will go to the new trap table, which will then jump to the correct trap entry in the firmware's trap table.  RTEMS can overwrite this default behavior by installing new trap handlers, thus preventing the jump to the original trap table.

RTEMS is able to install user ISRs to the new trap table as expected, and any non-overridden trap handler will vector to the firmware trap table.  This means that the firmware trap table is no longer copied but can still be used, while new traps can be installed by RTEMS in order to support user-defined interrupt handlers.

My current solution still requires the MMU support for installing the new trap table's TLB mapping, but it seems to be reliable so far and has improved the robustness of sun4u test cases.  Installing new trap entries for all of the common-case traps would be a good improvement from here, since it would reduce the reliance of the port on the correctness of the firmware code. It also might be worthwhile to use the same mechanism on the niagara port. For now, though, I am satisfied with my solution.

No comments:

Post a Comment