Filed under: Fault Localization

Review on Pruning Dynamic Slices with Confidence

Original Paper at http://dl.acm.org/citation.cfm?id=1134002

This paper presents a technique to reduce a dynamic slice derived out of a dynamic dependence graph of a program, with the goal of reducing the search space (the statements in the slice which need to be inspected) for finding the fault. The assumption the paper makes is that the faulty statement will be contained in the slice. However, later in the evaluations of the technique it does report of situations where that is not the case, making it one of the limitations of the technique proposed. A work around with relevant slicing is however proposed. The reason why this is a limitation is because the pruning is done with the motivation of locating faults, and if the fault itself is missing in the slice, it is not helpful. Nonetheless, it is to be noted that the basic idea of tagging each statement with a confidence value depending on a property such as relation with an incorrect output, or a possible security problem with in the code is still valuable, especially if the idea is to be applied to reduce the search/inspection space for the user. Thus, the primary contribution of the paper is the method of pruning based on a very specific property of the program like a faulty/incorrect output. The fact that this can be used for fault localization, in my opinion is the secondary contribution.

Assuming that the fault exists in the slice, the approach that the paper takes in pruning the statements is by assigning each statement in the slice with a confidence value between 0 and 1, with 1 being highly confident. High confidence indicates the statement is not responsible for the incorrect output and thus can be removed or 'pruned' from the dynamic slice with 'confidence'. The paper shows that removing just statements with a confidence value of 1, results in a reduction which ranges from 1.79 to 190.57 times of the number of distinct statements in the regular slice. The paper also reports the effects of having a threshold on the confidence values for inclusion in the pruned dynamic slice (PDS). With decreasing the threshold value the size of the slice decreases, but so does the effectiveness of localizing faults with that slice.

The confidence values are evaluated in the following manner. Each statement is assigned a confidence of 1 if it is involved in computing at least one of the correct output values, but is not involved in calculating any of the incorrect values. A statement is assigned 0, if it is involved in the computation of only incorrect values and none of the positive values. The ambiguity comes in the case of statements which are responsible for the computation of both correct and incorrect outputs. For these statements the following intuition is applied. If a given statement computes any value, which when changed in any manner results in changing the final output from correct to incorrect, then we know that the original value that was produced at that statement was correct. This shows that there is a direct correlation between the output and values produced by that statement. Had the output remained correct , i.e. unaffected despite changing the values produced at a statement, then it is reasonable to say that the there is no or little correlation between the values produced at that statement and final set of correct outputs. Thus, not allowing us to say with confidence that the values being produced at that statement are indeed correct. This idea is used to compute those possible number of values produced at a given statement which when changed do not effect the final correct outputs being produced, making them incorrect. In the case of a high confidence statement (confidence value = 1) such values do not exist. In the case zero confidence statements, all the values produced at that statement have no impact on the final correct outputs being produced. The mathematical relation in the paper depends on such intuition. It is important to note here that the confidence value indicates the confidence we have in the fact that a value produced at a statement is correct not if that value is incorrect. It is not apparent if low confidence indicates that incorrect values are being produced at a statement.

This intuition draws my attention to the fact that the technique requires the program to produce an output. This I feel is a major limitation on the part of the process of pruning itself. The way the technique is described, if the output is not produced, the concept of a correct/incorrect output is non-existent and one cannot assign confidence values to the statements. The other big problem with this kind of reasoning is that if the very first output produced is incorrect then the approach is inapplicable, and the paper notes this in its experimental section. On a different note, the paper states that this technique is limited to only one error. However, I am not quite certain if this is a limitation. Digiuseppe and Jones1 show in their study of fault interactions, how a single fault tends to obfuscate other faults. So even if there are multiple faults, chances are that this or any technique for fault localization will end up detecting only one fault at a time.

The paper also talks about using the confidence values to rank statements in a dynamic slice and presenting it for user inspection. The only problem with providing users with a such a ranked list of source code statements is that users will be looking at those lines without any context, because these lines are probably not directly related with each other especially in the ranked order. Without looking into the source code, they will have no way of knowing if one of the lines in that list is the fault or not. And this problem will only compound if the user has little or no idea about the source code.

Overall, I found this technique to be sound in its basis for reducing a search space in general, but limited when it comes to applying it for locating faults.

1 N. Digiuseppe and J. A. Jones Fault Interaction and its Repercussions, Proceedings of the 27th IEEE International Conference on Software Maintenance (ICSM). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6080767&tag=1