This blog post describes a research initiative aimed at eliminating vulnerabilities resulting from memory management problems in C and C++. Memory problems in C and C++ can lead to serious software vulnerabilities including difficulty fixing bugs, performance impediments, program crashes (including null pointer deference and out-of-memory errors), and remote code execution.
The exploitability of memory-management problems in C and C++ (including double-free vulnerabilities) was first demonstrated in 2000 by the security specialist Solar Designer. Even before Solar Designer wrote a blog post demonstrating this exploit technique, the problems of memory management had been widely acknowledged throughout the C programming community since the initial standardization of C in 1989. This elimination can be achieved by designing and implementing an ownership model for dynamically allocated memory.
When a program begins execution, it is provided with a region of memory called the runtime stack. This stack grows and shrinks during the program’s lifetime; it contains memory that the compiler determines will be required by the program. But some programs require memory that the compiler cannot measure, usually because the amount of memory required will be known only to the program when it runs. This type of memory is typically managed as a heap, which stores memory dynamically allocated and deallocated by the program.
When software developers need to write code requesting more memory, they use a pointer to refer to the new memory allocated from a heap. This pointer becomes the developers’ link to that memory, and developers can use it to copy its contents or accomplish any task they would do with other data. Afterward, the developer can free the memory (i.e., return it to the platform’s heap). After memory has been freed, the program no longer has use of the memory and must not try to read or write to it.
This exchange of memory between platform and program works well as long as the program stays within the bounds of the memory blocks provided by the platform. Unfortunately, programs often request hundreds of blocks of memory. Tracking those memory blocks can become a major effort with disastrous consequences should the developer make a mistake.
The buffer overflow is a common programming error in which a program writes to an object (usually a string), but the program exceeds the bounds of memory allocated to the object and overwrites adjacent memory. Buffer overflows that occur in the heap are typically not as easy to exploit as buffer overflows in the stack. They can be commonly exploited, however, as Solar Designer showed in his blog post.
Some programmers make the mistake of freeing memory twice (double-free errors) or never freeing it at all. Programs that free memory twice have been exploited both in the lab and in the wild. Failing to free memory may not be particularly harmful, but long-running programs that fail to free memory will often continue to request memory until the available memory is exhausted. Memory exhaustion can cause these programs, as well as any other programs on the same platform, to crash when their subsequent requests for more memory are denied.
Most modern programming languages (e.g., Java, C#, and Python) prevent these problems by limiting pointer usage. While this can achieve memory safety, it also eliminates a major source of power and flexibility from these programming languages. Developers who require strong control of memory management must use C and C++, despite the memory management errors that often plague programmers who use these languages without being extremely careful about properly freeing their pointers.
The Pointer Ownership Model
The C language has a fairly simple set of system calls for managing memory. If you want memory from a platform, you would use the malloc() system call. Conversely, if you want to release a block of memory back to the system, you would use the free() system call. C++ has similar mechanisms for allocating and freeing dynamic memory via the new and delete operators.
Every program has the difficult task of keeping track of allocated pointers and freeing each pointer exactly once. In deference to the issues with memory management, some developers write their own application programming interface (API) to manage memory—usually in the form of wrappers around malloc() and free(). The evolution of these APIs has produced several well-known memory-management techniques, such as reference counting or garbage collection, to keep track of dynamic memory usage. Many larger software programs, such as Firefox, use one or more such systems to track the pointers that must be freed.
Many other programs, however, provide no systematic scheme for tracking allocated pointers, and therefore trust that the programmer “got it right.” Our approach, called the Pointer Ownership Model, applies to programs that do not have their own memory management scheme.
The Pointer Ownership Model aims to help software developers distinguish pointers that need to be freed from pointers that don’t. This can be accomplished by using static analysis and developer annotations to produce strong safety guarantees. The Pointer Ownership Model partitions pointers in a program into “responsible” pointers and “irresponsible” ones. A responsible pointer must be freed eventually, and an irresponsible pointer must not be freed.
The theory behind our research is that within a software program, if all the pointers are divided into responsible pointers and irresponsible pointers, that program can be automatically checked to ensure that every responsible pointer gets freed and every irresponsible pointer never gets freed. This check can be done at compile-time so that the program does not need to be run.
Our approach involves
- designing and implementing an ownership model for dynamically allocated memory
- performing sound static analysis of the consistency of preexisting C source code and annotations
- evaluating efficacy by analyzing existing open source programs with known and seeded errors
- measuring annotations required per thousand source lines of code (KSLOC)
To implement our Pointer Ownership Model, we are building two programs: an advisor and a verifier. The advisor takes a file of C source code and builds an ownership model of the pointers used by the code. In other words, the advisor examines the code and determines which pointers are responsible and which are not. If the advisor cannot determine the responsibility status of a pointer, it will ask the user. Once the model is complete, the verifier can then take the model and the source code and indicate if the code complies with the model. If the code does not comply with the model, an error message would be produced. This could happen, for example, if a responsible pointer goes out of scope while never being freed.
Memory management is an old problem, and there are many tools, free or proprietary, to help programmers tackle it. Most memory debugger tools employ dynamic analysis. They monitor a program as it runs while tracking its use of the heap, and they generate error messages if they detect memory bugs. Valgrind is one popular dynamic memory debugger for Linux systems. These tools accurately report errors; however, they impose a performance hit on program execution and so cannot be used on production code.
A few debuggers use static analysis, which does not require the program to be run. Instead they analyze the program’s source code. Because the debuggers do not observe a running program, they can only theorize about what errors may occur, so they can suffer from false positives (reporting an error in a bit of code where there is no error) and false negatives (failing to report a true error in the code). A false positive requires a developer to inspect the code and verify that it needs no change, but a false negative can cause the analyzer to declare the code to be bug-free when it is not. False negatives are considered a far worse problem than false positives, and many static analysis tools err on the side of minimizing false negatives while issuing lots of false positives. Coverity and Fortify both produce commercial static analysis tools, and Splint is a free static analysis tool.
The Pointer Ownership Model employs static analysis, but unlike traditional static analysis tools, it requires extra semantic information not provided by traditional C source code. The Pointer Ownership Model simply needs to know which pointers are responsible and which are irresponsible. The programmer must supply this information, although the advisor program should do most of the work. This semantic information means that the Pointer Ownership Model will not be subject to false positives or false negatives. The error messages it generates will be sound, and it will not overlook violations of its model. And, because it does not employ dynamic analysis, it will impose no runtime penalty.
To create the verifier program, I am collaborating with Lutz Wrage, a researcher in the SEI’s Research, Technology, & System Solutions Program. Lutz is building a prototype of the model using the C Intermediate Language (CIL) Framework. The program that we are creating is an open source project, and it needs to contain a C language parser that can build an intermediate representation of a C program. This is essentially the first step performed by a compiler. But our goal is not to build a compiler; it is instead to conduct an in-depth analysis of the C program to enforce our model.
Impact to the DoD
The verifier program that we are developing will enable the Department of Defense (DoD) to expand its capabilities with the proof-of-concept secure C compiler. It will also influence C language standards and commercial compiler development. We also hope the verifier will provide the DoD and DoD contractors with a secure compiler technology.
Accounting for Corner Cases & Other Challenges
This approach will face several challenges. One challenge centers on developing a sufficiently timely approach whose benefits will outweigh programmer effort. The research must also demonstrate a low annotation effort and low runtime overhead.
One of the most difficult challenges in building any program is corner cases, which in this instance would be valid code that we did not account for. The C language is complex and contains constructs that will derail our model. For example, our current model does not handle arrays of responsible pointers, so we disallow them and require that all pointers in an array of pointers be irresponsible. While forbidding responsible pointer arrays keeps the model consistent and simple, it greatly limits its usability. Therefore a future goal of this research is to extend our model to handle arrays of responsible pointers.
We have several small code examples that we can run through our model verifier once we have built it. Some of these code examples comply with a consistent model, and others intentionally violate the model. The verifier should produce an error message when handed one of these noncompliant examples. We eventually want to test the verifier on a larger software program so that we can understand the true impact of corner cases on our research.
Another challenge will be transitioning our implementation from operating on toy code examples to real-world code. Consequently, one of our research goals is to survey open source software to find out which projects use their own memory management model and which do not. An example of the latter would be a suitable test case for the Pointer Ownership Model.
Our current research posits that our model of pointer ownership can be applied to C programs that do not already provide their own ownership model. We can extend our model in several ways, including
- handling arrays of responsible pointers
- handling C++ programs
These will be subjects for future research.
For more information about the work of the CERT Secure Coding Team, please visit