The Problems with Reuse and Abstractions
When talking about software reuse and abstractions it is tempting to bring up one of two analogies. The first treats reusable components as tools, while the second thinks of them as larger, more purposeful Lego blocks. Neither is really correct. In both cases we get an impression that reuse in unequivocally a good thing. The more components or tools we use, the better is our final product. Unfortunately that is not exactly the case. In this post I want to talk about abstractions and reuse, why we like them, and what we have to watch out for.
Abstractions
Before talking about the problems with software reuse, we first need to figure out why software developers want to abstract low level detail into something higher and simpler. Even this idea of “low level” is not crystal clear. The “low level” keeps changing. Not too long ago, machine instructions were considered low level and we abstracted them with C.
It’s clear to see why most developers would not want to code in machine instructions. One needs to be very careful when doing so. There is lots of code to write. None of it is particularly obvious. And therefore there is lots of room for error. By abstracting machine instruction into C we can encapsulate a long chain of instructions by something more concise and human readable. And this is the point where we begin to get into trouble.
Remember at the end of the day any program we write has to be converted to a set of machine instructions. By writing the program in a language like C we are putting a lot of faith into the compiler, be it GCC or the Microsoft compiler. Now granted, these compilers have been around for a long time and have a long history of optimizing and compiling code. We can probably trust them, but this is only where the problems begin.
As the complexity of the system being created increases, so does the amount of code, and therefore bugs, in the system. Developers face the same problem they did when writing machine code. The system is too complicated to keep track off. Because of that complexity, we need yet another level of abstraction. This time more abstraction is added through the use of libraries. Developers incorporate libraries into their projects to do just about everything, ranging from basic tasks such as managing strings, to more complex jobs, such as managing databases. This reuse of libraries is where development may get into real problems.
The Issue with Libraries
The issue with reusing libraries basically boils down to understanding. If the library contains bugs, so will the application using that library (whether or not that bug manifests itself). If the developer is not aware of return codes that functions return or exceptions those functions may throw, using the library may hurt the final system more than help it.
By using libraries developers give up control over their product. However, if developers can find a way to trust and understand a library, then there can be a great benefit. Less code needs to be written, therefore there will be fewer bugs. Developers don’t need to think about how to solve the problem a library already solves. This gives developers more time to concentrate on making their product better.
A key to understanding a library is documentation. One can search the Internet for example of how the library is used, but then you run the risk of finding a common misuse of the structures you need. There are tools such as ParseWeb that will assist developers in using libraries based on how others have used it in the past, and do it bit more intelligently than Google, but they are still in early development in software engineering labs. The key currently is good documentation from the library creators.
Choosing a Library
The next step in choosing a library is trust. Developers need to feel confident that the library they are picking has few bugs, and those that it does have will either not come up or can be fixed by the library’s development team. There are a few things one should look for to feel confident about a library.
First, if you are familiar with the development team, either you used their libraries before or have a personal relationship with them, you can gain a level of confidence (assuming you have been happy with their past work).
You can also gauge a library’s performance based on who else uses it. If you find lots of existing applications that already use the library, that’s a good sign that the library functions as expected.
While those two ways are usually the simplest and can serve you well when creating a quick prototype, for production code you really need to dig a bit deeper. First, if the library has an automated test suite you should become familiar with it and make sure that it actually passes. If the suite does not exist, or the suite that is provided does not test the functionality you need, you should create your own additional test cases. By taking control of at least the testing process you can gain far more confidence in the library than simply going by reputation.
Finally, and this is the most extreme measure, you can read the source code if you have access. Though, the time you may need to understand the code may just be better spend recreating that functionality.
At the end of the day developers need to understand what goes into their code. Libraries provide a security blanket of “I don’t have to deal with that since I didn’t write it” mentality. But that may very well be only a false sense of security and, once committed to a library, developers may find themselves in a lot of trouble if the library is actually broken.
Abstraction via Programming Languages
While libraries are one way to abstract what’s happening, there are others that need to be mentioned. Instead of coding in C, developers may wish to choose an even higher level language such as Java, Ruby, or Python. These languages further remove one from the machine instructions by adding many more shortcuts such as memory management and garbage collection. By using languages with these features, developers remove a whole class of bugs from not only their code, but also the code of the libraries they wish to use.
As with our previous discussion of using libraries, again developers need to be aware exactly of what the choice of a language entails. On the surface it may seem like a no brainier, but by choosing a different language you are stuck with the decision of how features such as memory management and garbage collection are implemented. For the most part that is not a problem, but it can become one in certain classes of applications.
A great example of this is Ruby and its handling of threads. Most developers will not notice or care that Ruby employs green threads. Meaning that Ruby cannot take advantage of multiple processors or cores. No matter how many threads one starts, they will always run on a single processor. There are reasons for this choice as well as lots of ways to get around it. However, one has to be aware that it even exists and not simply be expecting it. (If you are curious about Ruby’s thread implementation, here is a good article to get started.)
The Point
The point I’m trying to make is that analogies between the real world and software engineering can only take you so far. Software components are not tools, but neither can they be compared to physical components. When working with physical objects there are more limitations placed on them, therefore making them simpler to understand, provided you know the domain the object resides in. However, with software components one is not so lucky. The objects themselves may contain complex logic and it is rarely clear exactly what that object does or how one should interface with it.
At the end of the day it is the responsibility of software engineers to understand all of the code that they have created or are using, at least to some extend. It is our job to make sure that the libraries we use are reliable and can be trusted to do what they promised to do. The decision to use a library has to be made carefully with proper consideration.
There is lots more to talk about when discussing code reuse and abstractions. Things like Model Driven Engineering (MDE) and Domain Specific Languages (DSLs), but that’s all for another post.
This post was inspired by a post on Avian’s Blog check it out.