Filed under: Memory Leaks

Beware of the Hound ... woof!

This was one of the more interesting papers i read last weekend. Here is the link: http://dl.acm.org/citation.cfm?id=1542521. And here is what i understood of it:

This paper addresses the problem of memory leaks with a radically new approach compared to the papers that we have read so far about Memory Leaks. Its approach is data centric instead of code centric as was the case with Sleigh and SWAT. We have read one other data centric approach with Cork, however Cork was addressing a more Java specific issue in the world of memory leaks. This paper presents Hound, which is looking at resolving memory leaks in C and C++. And it does so in the most fundamental manner possible, i.e. by revamping the strategy of allocating or rather organizing memory for objects in the heap. This is a very low level approach, which for the purposes of this paper has been done largely in the Linux environment. The abstract idea presented here seems portable, however the authors would have done well if they had highlighted how easy or difficult would porting such an approach to other environments like Java and Windows be. As a side note, I could not help but wonder how the C language could be referred to, in a paper which discusses objects?

The allocation of memory is done using segregation of objects first on the basis of allocation sites, and then on the basis of object age, which helps in measuring staleness. Segregation on the basis of allocation sites follows the logic that objects allocated in the same site tend to follow the same behavior. Which leads to the notion that a leaking site is the source of leaking objects and hence makes sense to group them together. This is aligned with the ordering logic in SWAT which too uses the number of objects for a given allocation site while reporting leaks to the user. Each allocation site is associated with its own heap. In order to reduce memory overhead, a site is assigned a new heap only after the number of objects in each of its previous heaps reach a threshold value, i.e are full. The object count is kept track of by inserting a hash of the allocation site in each of its objects. The count is incremented upon an allocation and reduced upon memory frees. While objects with the same allocation site exhibit similar behavior, it is often not the case. In order to avoid anomalous cases where hot and old objects are generated from the same allocation site, segregation on the age of the objects is also done. For this, two lists: active and inactive are maintained within the aging queue, which holds all the filled pages. The pages in the active list contain objects in a FIFO order which are accessed more recently and old objects are stored in the inactive list which stores it in the least recently used order. This separation of active and inactive lists is done to reduce the performance overhead which will occur due to increasing number of page faults when objects are accessed from pages in the age queue. Such page faults occur because pages containing old objects are protected. Thus the pages in the active list are left unprotected. Age for objects are calculated using (current time - protection time). When an object is accessed from a page in the inactive list, then the page is shifted to the head of the active list and the age for all the objects in that page is set to 0, and the page is set to be unprotected. My only concern with such an approach where the ages of all the objects are handled together is that, just because one among them was accessed the rest might not necessarily be used again. This shows the importance of limiting the number of objects in a given page.

Now the shifting from inactive to active is done on a need-to-need basis, and hence can be viewed as lazy in approach. However, Hound maintains a target size for the inactive list, which is approached gradually in order to reduce the sudden increase in minor page faults that might result from a large shift of pages from the unprotected active list to the protected inactive list. If the inactive list's size is significantly smaller than the target size, Hound moves as many as 8 pages from the active list to the inactive list. This is far more aggressive than the policy it adopts while shifting pages from the the inactive list. This exhibits the conservative nature of the approach that results in zero false positives, but also leads to an increase in false negatives.

In this particular memory management policy, the objects of a page are not reclaimed till the live count of the page is zero. Thus, even if a single live object is present the page will not be reclaimed. This induces a severe memory overhead, which can be resolved by merging pages with 50% or lesser live count on top of each other in physical memory while keeping them separate in virtual memory. As explained by the paper this is done in a cost effective manner.

Due to the segregation strategies, while the staleness analysis is done at a page level, results are reported at a per allocation site basis. The sites themselves are ranked based on the number of objects from an allocation site and the summation of the drag of all the objects at that site, which is similar to the ordering approach that was taken up in SWAT.

In terms of performance Hound results in slightly higher overheads both for memory and runtime. While is true for most techniques which deal with memory leaks, they are still far more portable than Hound as they address this problem at a higher level of abstraction. Hound on the other hand addresses this problem at a much more low level, and thus one would expect little or no over head of any nature. On a different note, comparing the accuracy of staleness computation with that of SWAT is of little value in my opinion. Like the paper rightly points out, Hound and SWAT have different limitations and both do well within those limitations, as pointed out by the results. It is like comparing apples and oranges: both are fruit, but are vastly different in terms of taste and the nutrition they provide. It would have been better to compare Hound with a more data centric approach such as Cork.

Nonetheless, the comparison gave me an interesting idea. Since, Hound looks to underestimate staleness and SWAT looks at overestimating it, it won't be a bad idea to look at the overlap of the results produced by them. The overlap might give you a 100% accurate listing of the memory leaks. Having said that, between the two, I would choose SWAT, simply because overestimating is better because it might point to certain design flaws in the memory management of the program under test. A conservative approach, like Hound, will never be able to do this.

Review on "Bell: Bit Encoding Online Memory Leak Detection"

Yet another review. This is the original paper: http://dl.acm.org/citation.cfm?id=1168866

Generally, this paper looks at the problem of detecting memory leaks in programs and tries to solve the same using a dynamic program analysis approach which is partially rooted in statistics. The paper seems to be interested in memory leaks caused by Useless Objects, in other words, objects which are referenced by in-use components of the program but the objects themselves will no longer be used. This is especially true of the managed languages such as Java and C#. While they resolve the problem of freeing unused objects using Garbage Collection (GC), detecting if an object will ever be used is more difficult. This is so because you are in effect trying to predict how (or if) the object will be used in the future states of the execution. Also, the paper makes one very important assumption about memory leaks: they have no immediate symptoms. This resonates with the idea that predicting the future use of an object is difficult. It is analogous to believing only after seeing it for sometime. You believe that the object is useless from now on, only if you have noticed it not being used over a reasonable period of time.

The paper's primary contribution is is the Bit-Encoding Leak Location or Bell technique. It is an elegant solution for encoding per object related information using a single bit. In this paper it has been used to associate an object's allocation and last use information with a site, i.e. a location with in the source code, using a near zero space overhead. While the technique has been portrayed for use in the context of Leak Detection it can potentially be used for other purposes as well, as it is a general purpose encoding technique that stores object related information. The paper's end goal however, seems to be to pin-point the location or the site of the memory leak, so that a programmer can examine the location and fix the leak itself. This is achieved by the paper's secondary contribution: a tool called Sleigh. Sleigh employs Bell to detect memory leaks.

Bell performs the encoding of an object's site in a single bit. This bit can be one of the free unused bits in the header of the object. Using a simple encoding function, f(object, site), which takes the identifiers of the object and the site, a 0 or 1 are stored in the encoding bit or the site bit since it stores information about the site of the object. An object matches a site if the object's encoding/site bit is the same as that calculated by f(object, site). The encoding bit itself can be stored whenever the object is allocated with memory or is used depending upon the requirement. The encoding function that is selected should have two important properties. Firstly, it should be easy to compute, since it will be computed during runtime thus creating a performance overhead. Secondly, it should be unbiased thus ensuring that an object matches a given site with 50% probability, if it was not encoded with it to begin with. If it was encoded it will always be a match.

Decoding a site means finding out the number of objects that were encoded with the site along with the total objects which match the site. Decoding a site is done for a particular subset of objects. For each of these objects the f(object, site) is computed and is compared with the site bit of the object. If they are the same, then the object and the site match and the match count for the site increases. This is denoted by msite. Hence you find out the number of matches for a given site. If nsite is the number of objects that were encoded with the site, then msite = nsite + 1/2*(n-nsite), where n is the number of objects in question. This is simply because the objects encoded with the site will always match and the remaining objects have a 50% probability of matching the site owning to the 2nd property of the encoding function as mentioned above. From here, one can compute nsite = 2*msite -n.

Such a decoding scheme, for a given site, does not give you the detailed list of objects that were encoded with the site. However, it gives a statistical strength to site i.e. the number of objects encoded at that site. Now, if the common property of all the objects that are associated with the site using the decoding technique is, that they are stale, i.e. have not been used for a very long time and are potential candidates for being useless, then you can say based on the statistical strength of that site which is nothing but the number of stale objects encoded there, that it is a strong source of a memory leak. Which is what has been done using Sleigh. The candidate set of objects are nothing but stale objects. The staleness in effect decided by the staleness bits for every object, which are incremented logarithmically, based on the number of times the GC is called, as explained in the paper. You get a saturation point at 3 as only 2 bits are used for the counter, and that is a state of being highly stale. 0, 1, 2 represent zero, slight and moderate states of staleness. Sleigh maintains 2 bits in the object's header for encoding purposes. One is to encode the object's allocation site, while the other is to encode the object's last use site. The encoding bits are maintained using instrumentation at the application. The decoding unit however, is maintained and run as a separate unit.

The evaluations with SPEC JBB2000 and Eclipse show promise especially due to the fact that the memory leaks were pinpointed very accurately and with a high probability of 99.9% due to the mFalePositive threshold for weeding false positives and nmin, which is the minimum number of objects required to be encoded with a site to be able to predict with the 99.9% that a site is leaking memory. However, there were cases where memory leaks were missed, and furthermore, the testing performed on Sleigh was not extensive. It was based on just two candidate programs, and only specific leaks were identified. That being said, the biggest problem for this kind of a technique using Bell, is a required high value of nmin, for accurately predicting memory leak locations, which can only be attained with reasonable amount of runtime. This is going back to the problem of believing only after seeing it for sometime. This is another reason that large number of candidate programs will not be available for testing this technique, the candidate requires to be running for a long period of time. Which raises an important question: if situations conducive for this technique to work while detecting leaks do not exist aplenty, then is this a valid solution? Nonetheless, the idea of encoding per object information is novel and should be employed in other applications.

As an after thought, a good idea to reduce the dependence on nmin is to present a ranked list, whose ranking is done on the probability of the site being a point of memory leak. While nmin will ensure a accurate result, in shorter running programs, it is probably a better solution to show results i.e. candidate locations with reduced probability but with multiple options.

Review on "Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages"

Another one of my reviews which was my assignment. This is the original paper: http://dl.acm.org/citation.cfm?id=1190224

Cork is a memory leak detecting tool that looks into the detection of leaks created by useless objects in managed programming languages such as Java and C#. The idea here is not to pinpoint the site of the memory leak which like in the case of Sleigh1 was a source code line. Instead Cork looks to identify the Data Structure responsible for growth of the heap and thus the cause of a memory leak.

Cork does this by summarizing the live objects or types in the heap. This is done by piggybacking the Garbage-collector which essentially begins the process of garbage collection by scanning for all the live objects starting from the roots. This piggy backing is done using the type information block or any such analogous entity which is usually required for the implementation of managed programming languages. The TIBs, provide a rich source of type information for objects which are required by the compiler for proper functioning. During the scanning phase, a Type Points-From Graph (TPFG) is created in which, the nodes are the different Types in the heaps. The type nodes are annotated with the volume or number of objects in the heap of that particular type. The edges between the type nodes are Points-From edges, i.e. when an edge is drawn from t2 to t1, it means that objects of type t2 are pointed or referenced from objects of type t1. The edge itself is annotated with a weight which denotes the number of objects of type t2 referenced by type t1.

Such a TPFG is created while the scanning process itself. Each time an object is encountered the corresponding nodes and required edges are updated. This induces a very low time overhead which makes this process an ideal candidate for production environments. When multiple such TPFGs are collected, differences are calculated between them and trends of growths are observed. To be sure of the non-decreasing nature of the growth, a ranking scheme for the node types is used. A ranking threshold is used which is used to display all candidate types which rank higher than the threshold. The ranking scheme used and described by the paper is the Ratio Ranking Technique, which uses the ratios between volumes of tow consecutive TPFGs. Furthermore, the phenomena of jitter or fluctuating volume patterns is accounted for using a decay factor. The idea is to account for types which show an over all growth pattern despite fluctuating volumes.

Thereafter, a slice is constructed through the TPFG. By starting at the candidate type node, all the growing edges are traced until a non growing type node is encountered with non growing type edges. This is also accompanied with a Site Map for all the candidate types which essentially shows the allocation sites in the code base, which can be a combination of method name and source code line number or byte code offset. The slice along with the site map give a clear picture of the potential memory leaks along with their exact location in code.

The evaluation for this technique look good and quite accurate. However, 0only specific examples have been shown. It might have made a sound argument to show an extensive study on a range of candidate programs or at least on a range of candidate memory leak bugs. Nonetheless, the examples shown in the evaluations are compelling, especially due to the low memory and performance overheads. However, this evaluation has not shown one simple situation that comes to mind: what if the candidate program required to have a growing heap size. While such a situation might be rare the paper has failed to show if something like that will give a false positive or not. In other words, what if the requirements were for a growing heap size, where all the objects in the heap would be used. It is important to note that this technique relies on the premise that a growing type/object size necessarily imply a memory leak. Which may not always be true. And this is the source of my question regarding this paper: How can you be sure that a growing heap size is definitely an indicator of an increase in unused objects which are being referenced? In other words, how does this technique ensure/verify that the objects responsible for the growing heap size are indeed not going to be used anymore? Because, unlike the Bell1 technique, there is no mechanism in place to indicate the staleness of the objects.

1. Michael D. Bond and Kathryn S. McKinley. 2006. Bell: bit-encoding online memory leak detection. InProceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII). ACM, New York, NY, USA, 61-72. DOI=10.1145/1168857.1168866 http://doi.acm.org/10.1145/1168857.1168866