By Dean Sutherland
Senior Member of the Technical Staff
The CERT Program
Many modern software systems employ shared-memory multi- threading and are built using software components, such as libraries and frameworks. Software developers must carefully control the interactions between multiple threads as they execute within those components. To manage this complexity, developers use information hiding to treat components as “black boxes” with known interfaces that explicitly specify all necessary preconditions and postconditions of the design contract, while using an appropriate level of abstraction to hide unnecessary detail. Many software component interfaces, however, lack explicit specification of thread-related preconditions. Without these specifications, developers must assume what the missing preconditions might be, but such assumptions are often incorrect. Failure to comply with the actual thread-related preconditions can yield subtle and pernicious errors (such as state corruption, deadlock, and security vulnerabilities) that are intermittent and hard to diagnose. This blog post, the first in a series, describes our ongoing research towards solving this problem for a variety of languages, including Java and C11.
In previous work, we introduced the concept of “thread usage policy,” which is a group of often unspecified preconditions used to manage access to shared state by regulating which specific threads are permitted to execute particular code segments or to access particular data fields. The concept of thread usage policy is not language specific; similar issues arise in many programming languages, including Java, C11, C#, C++, Objective-C, and Ada. The preconditions contained in thread usage policies can be hard to identify, poorly thought out, unstated, poorly documented, incorrectly expressed, out of date, or simply hard to find. These problems inspired us to devise a means of specifying these preconditions in a form that developers would find both useful and acceptable.
We developed a simple, formal specification language for modeling thread usage policies, which we call the language of thread role analysis. We devised appropriate abstractions of the key semantic building blocks of thread usage policies (including thread identity, concrete code segments, and data fields) so that developers can build a model of the thread usage policy (hereafter referred to as simply “policy”) by expressing preconditions as simple precise annotations in program code.
My focus is on bringing thread role analysis to bear on programs written in C11, which is the current standard of the C programming language (the core analysis is also similar to the analysis for Java). To ensure relevance to practicing programmers, we integrate our analysis with widely used programming tools—in this case, the CLANG/LLVM open-source compiler suite. The C11 work is still in its early stages, so I’ll present an example from our previous work in Java.
Electric Had a Problem
The developers of the Electric VLSI design tool had re-implemented it in Java for their V8.0 release, in part to support a change to a multi-threaded architecture. Users of the new version of the tool encountered seemingly random internal errors and crashes. These crashes were rarely repeatable, however, and usually disappeared when developers tried to debug the program. Since they were experienced developers, they quickly realized that these symptoms were probably caused by concurrency errors. This diagnosis seemed especially likely, since the major new feature in the most recent release was the implementation of a multi-threaded architecture for performance. The developers needed to quickly identify the problem in their 140 KLOC program, and we were able to do that in less than 8 hours using thread role analysis. We analyzed version 8.0.1, which contained roughly 140 KLOC in 44 Java packages.
We describe the process in the form of a reverse-engineering exercise, partly because that’s how we addressed the problem and partly because it helps explain thread role analysis. Note, however, that this reverse-engineering-focused description makes thread role analysis sound relatively time-consuming and hard. In case studies, we found that developers working on their own code could easily identify policies, answer questions about intended thread usage, and locate key points for annotation.
Identifying Thread Usage Policies
The first step in thread role analysis is to identify relevant pre-existing policies. We’ll use the Electric tool from our example because the developers had a pre-existing written thread usage policy, which appears here in a slightly edited form:
There is one persistent user thread called the DatabaseThread, created in com.sun.electric.tool.Job. This thread acts as an in-order queue for Jobs, which are spawned in new threads. Jobs are mainly of two types: Examine and Change. These jobs interact with the Database, which are objects in the package hierarchy com.sun.electric.database
The Rules are
- Jobs are spawned into new Threads in order of receipt.
- Only one Change job can run at a time.
- Many Examine jobs can run simultaneously.
- Change and Examine jobs cannot run at the same time.
- Examine jobs can only look at the Database.
- Change jobs can look at and modify the Database.
Because only one Change job can run at a time, the Change job is run directly in the DatabaseThread. Examine jobs are spawned into their own threads, which terminate when the Examine job is done.
We thus identified one of the two key policies that needed to be expressed; the other is the policy for Java’s AWT/Swing GUI framework, used by Electric’s graphical user interface (GUI). In less than one day of effort, we expressed models of these two policies and used the findings produced by our analysis tool to diagnose a set of “seemingly random intermittent failures” experienced by the development team and their users.
The Electric GUI uses the AWT and Swing frameworks, which share a single thread usage policy. The GUI framework implementation is not multi-threaded; rather, it executes in its own “event thread” and prescribes rules for how non-GUI threads may interact with it through the framework APIs. The salient points of the policy (according to Bowbeer) are as follows:
- There is at most one “AWT thread” at a time per application.
- There may be any number of separate “Compute threads.”
- A Compute thread is forbidden to paint or to handle AWT data structures or events. Failure to comply can lead to exceptions from within the AWT, because the AWT avoids both potential deadlock and data races by accessing its internal data structures from within a single thread, without the use of locks.
- Extended computation on the AWT thread is forbidden; “brief” computation is acceptable. While the thread is computing, it cannot respond to events or repaint the display; this “freezes” the GUI until the computation finishes.
It is important to note that the AWT data structures mentioned above explicitly include any fields of user-defined classes that extend the library-defined AWT and Swing framework classes.
Thread Role Versus Thread Identity
The policies expressed in prose above allow us to highlight an important feature of thread role analysis—focusing on thread roles rather than thread identities. The GUI policy above speaks of the “AWT thread”; although it fails to mention that, in some implementations, its identity changes from time to time. We don’t care which specific thread is the AWT thread right now; we care only that it performs the role of “AWT thread.” Similarly, Electric’s thread usage policy permits multiple Examine jobs, each with its own thread. We care about the “Examine” role but not about the identity of the various threads that perform it.
Expressing Thread Usage Policy
The first step is to declare any needed thread roles. The @ThreadRole annotation declares names for thread roles. Roles are opaque identifiers that have their own namespace. The Electric policy mentions two thread roles: one for examining the Database and one for changing it (n.b.: Electric’s “Database” is a large data structure that represents the circuit the user is designing; it is not a database in the sense of MySQL or other relational databases). Here are the declarations of the thread roles for Electric, which are described as Java comments:
**@ThreadRole DBChanger , DBExaminer
**@MaxRoleCount DBChanger 1
**@IncompatibleRoles DBChanger, DBExaminer, AWT
Likewise, here are the declarations of the thread roles for the GUI:
* @ThreadRole AWT, Compute
* @MaxRoleCount AWT 1
* @IncompatibleRoles AWT, Compute
We will use the DBChanger role for all change Jobs and the DBExaminer role for all Examine jobs. Similarly, in the GUI policy, we see two thread roles: the “AWT thread” and “Compute threads.” We declare these roles using the annotation @ThreadRole AWT, Compute, thus capturing the roles described in rules 1 and 2 of the GUI policy. We use the AWT role for the AWT thread and the Compute role for all other threads.
Next, we declare global constraints on thread roles. For example, an important aspect of the GUI policy indicates that the “AWT thread” and the “Compute threads” are distinct; it is forbidden for any thread to perform both roles at once. Similarly, the Electric policy implies that the Change and Examine roles are distinct. In each case, this incompatibility property allows us to conclude that a thread performing one of these roles necessarily excludes all of the others. Incompatibility is one of the few postconditions in our language for thread role analysis.
We state this incompatibility for the GUI framework by writing @IncompatibleRoles AWT, Compute for its API. The @IncompatibleRoles annotation in the snippet above specifies the incompatibility for Electric’s thread roles, as well as for the GUI’s AWT thread. These annotations capture the third rule of the GUI policy and the analogous—but implicit—rule for Examine and Change jobs from the Electric policy. Note that if we had followed the Electric policy literally, our annotation might have incorrectly omitted the AWT thread.
Finally, both the GUI policy and the Electric policy identify thread roles that may be performed by, at most, one thread at a time. We document this with @MaxRoleCount AWT 1 for the GUI and the similar annotation in the snippet above. Because “any number” is the default, we omit @MaxRoleCount annotations for the other thread roles.
Wrap Up and Look Ahead
We’ve now reached the halfway point in the process of expressing the thread usage policies. So far, we’ve identified the relevant thread usage policies along with the thread roles needed to express those policies. We’ve written annotations to declare the thread roles and to express the few global constraints on those roles—most notably incompatibility. In the second installment in this series, we’ll finish the thread usage policy by associating thread roles and thread role constraints with code segments, and show how thread role analysis enabled us to diagnose in 8 hours a violation that took “multiple” Electric developers “weeks of time” to fix independently. In the third and final post, we’ll discuss bringing thread role analysis to C11, along with techniques to improve adoptability by reducing required programmer effort and supporting analysis of partially annotated code.
To read the paper, Composable Thread Coloring (which was an earlier name for the technique we now call thread role analysis) by Dean Sutherland and Bill Scherlis, please go to