Measuring the Impact of Explicit Architecture Documentation

Architecture Documentation , Ultra Large Scale Systems Add comments

By Rick Kazman,
Visiting Scientist, Research Technology & System Solutions

Rick Kazman The SEI has long advocated software architecture documentation as a software engineering best practice.  This type of documentation is not particularly revolutionary or different from standard practices in other engineering disciplines. For example, who would build a skyscraper without having an architect draw up plans first?  The specific value of software architecture documentation, however, has never been established empirically. This blog describes a research project we are conducting to measure and understand the value of software architecture documentation on complex software-reliant systems.

Our research is creating architectural documentation for a major subsystem of Apache Hadoop, the Hadoop Distributed File System (HDFS).  Hadoop is a software framework used by Amazon, Adobe, Yahoo!, Google, Hulu, Twitter, Facebook, and many other large e-commerce corporations. It supports data-intensive (e.g., petabytes of data) distributed applications with thousands of nodes.  The HDFS is a key piece of infrastructure that supports Hadoop by providing a distributed, high performance, high reliability file system. Although there are two other major components in Hadoop—MapReduce and Hadoop Common—we are initially focusing our efforts on HDFS since it is a manageable size and we have access to two of its lead architects.

The HDFS software has virtually no architectural documentation, which expresses strategies and structures for predictably achieving system-wide quality attributes, such as modifiability, performance, availability, and portability. This project has thus become our “living laboratory” where we can change one variable (the existence of architectural documentation) and examine the effects of this change. We have enumerated a number of research hypotheses to test, including:

  • product quality will improve because the fundamental design rules will be made explicit,
  • more users and developers will become contributors and committers to HDFS because it will enable them to more easily learn the framework and thus make useful contributions, and
  • process effectiveness will improve because more developers will be able to understand the system and work independently.

We will measure the number of project features before and after the introduction of the documentation, where the “before” state becomes the control for our experiment.

We believe the insights gained from this project will be valuable and generalizable because Hadoop exemplifies the types of systems in broad use within the commercial and defense domains.  For example, Facebook depends on Hadoop to manage the huge amount of data shared amongst its users.   Likewise, the DoD and Intelligence Community use Hadoop to leverage large-scale “core farms” for various “processing, exploitation, and dissemination” (PED) missions. If the existence of architectural documentation yields benefits (or not), we can better influence acquisition policies and development practices for related software-reliant systems.

I along with my research team—Len Bass, Ipek Ozkaya, Bill Nichols, Bob Stoddard, and Peppo Valetto—have been assisting two of the HDFS’s architects in reconstructing, documenting, and distributing architectural documentation for the system. To do this, we initially employed reverse engineering tools including SonarJ and Lattix, to recover the architecture. This reverse engineering was only partially successful due to limitations with these tools. These tools are designed to help document the modular structure of the system, which crucially influences modifiability. In HDFS, however, performance and availability are the primary concerns and the tools offer no insight into the structures needed to achieve those attributes.  We have therefore undertaken considerable manual architectural reconstruction by interviewing the architects and carefully reading the code. 

After we finish developing and distributing the Hadoop HDFS documentation, we will measure the quality of the code base and the nature of the project, including

  1. number of defects
  2. defect resolution time
  3. number of new features
  4. number of product downloads
  5. size (lines of code, number of code modules)
  6. number of contributors and committers

These measurements will provide a time-series of snapshots of these measures as a baseline.  We will continue to track these measurements after the introduction of the (shared, publicly available, widely disseminated) architecture documentation to determine how the metrics change over time. We will also conduct qualitative analysis (via questionnaires) to understand how the documentation is being embraced and employed by architects and developers. We will examine the impact of the documentation on the developers’ interactions, specifically how it impacts their social network as represented by their email contributions to project mailing lists and comments made in their issue tracking system (Jira). Finally, we will interview key HDFS developers—both contributors and committers—after the introduction of the architecture documentation to gather some insights on their perspective about the usability and understandability of the HDFS code base.

This project is a longitudinal study, which involves repeated observations of the same items over a period of time.  It will take time for the architectural documentation to become known and used, so the metrics we are collecting may not manifest themselves right away. Likewise, after the documentation is distributed, it may take a while for it to be assimilated into the Hadoop developer culture, after which point we will be able to measure whether it has made an impact. Within a year, however, we expect to report on the metrics we gathered, as well as qualitative results from surveys and interviews of HDFS developers. Based on this information we will produce a paper describing our methodology and results from creating the documentation.<

Many of the systems that rely on Hadoop are highly complex, with millions of users and emergent behavior. Such systems have been previously characterized as ULS (Ultra Large Scale) systems. We hope our experiment in understanding the consequences of architectural documentation will advance the SEI’s research agenda into ULS systems.  We look forward to hearing about your experiences applying architectural documentation to software-reliant systems.


Additional Resources:

For more information about the SEI’s architecture documentation methods, please visit
www.sei.cmu.edu/architecture/start/documentation.cfm

For more information about the SEI’s work in Ultra Large Scale Systems, please visit
www.sei.cmu.edu/uls/index.cfm

Download the SEI technical report, Creating and Using Software Architecture Documentation Using Web-Based Tool Support
www.sei.cmu.edu/library/abstracts/reports/04tn037.cfm?DCSext.abstractsource=SearchResults

Download the SEI technical report, Architecture Reconstruction Guidelines, Third Edition
www.sei.cmu.edu/library/abstracts/reports/02tr034.cfm

Download the SEI technical report, Architecture Reconstruction Case Study
www.sei.cmu.edu/library/abstracts/reports/03tn008.cfm

Download our research study report, Ultra-Large-Scale Systems: The Software Challenge of the Future
www.sei.cmu.edu/library/abstracts/books/0978695607.cfm

Share this

Share on Facebook Send to your Twitter page  Save to del.ico.us  Save to LinkedIn  Digg this  Stumble this page.  Add to Technorati favorites  Save this page on your Google Home Page 

5 responses to “Measuring the Impact of Explicit Architecture Documentation”

  1. Jacinto Barbeira Says:
    Are you planning on making available some sort of feed for this work so the community can stay up-to-date with the results of this (very interesting) study?
  2. Rick Kazman Says:
    Yes! We plan on making regular updates to this work. You can follow our RSS feed (go to http://www.sei.cmu.edu/rss/ to sign up) or you can sign up for the email notification in the right-hand column of the blog home page.

    http://blog.sei.cmu.edu/
  3. M S HOSSAIN Says:
    Obviously software architecture documentation is important and software engineering best practice. Software architect should follow these practices for each and every software architectural design. But my question is how much documentation is good enough? Its measurement is difficult but not impossible. This measurement depends on how much benefit the software engineer wants to derive from the software architecture documentation.
  4. Rick Kazman Says:
    We believe that architecture documentation is important. But is it obvious?

    You ask exactly the right questions: how much is enough, and under what circumstances? Certainly I would want far more documentation for a distributed development of a large, mission-critical software product that I thought would live and evolve for many years than for a small, in-house prototype.

    We hope to be able to shed some light on these issues. Right now we are only operating on hunches and anecdotal evidence.
  5. M S HOSSAIN Says:
    Thanks a lot for your kind respond.

Add Comment


Leave this field empty: