History Slicing
History Slicing - Francisco Servant, James A. Jones. Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Short paper track, Lawrence, Kansas, USA, November 2011, pp. 452–455.
This paper puts forth the concept of history slicing, a method which traces a part of the history of a subset of the code-base, which is of interest to the developer. This subset of history is called a history slice, which can then be used to understand the evolution of specific program features and design rationale, to carry out code forensics and for generating code based developer statistics.This is done by building a history graph. Then, based on some form of selection criterion using which the lines to be investigated are chosen and the graph is traversed and the resulting slice is presented in an appropriate manner. This paper provides us with this bare minimum base upon which different methods for the steps described, can be used. The paper at every stage starts with a concept explains an adaptation for implementation and moves on to the next concept, thus giving a complete overview of the idea without dwelling deep into a specific direction, full of specific details. Many might view this as a folly, I do not. This in my opinion is a good basic start to novel subject.However, the method that the authors select for building such a graph, in my view, is too simplistic. Each node in the graph, which represents a line of code, is allowed to be connected only to one node in the previous revision of the code and/or to one in the next revision of the code. This view seemingly ignores the idea of a single line of code being written or broken down to more than one line or vice versa. E.g.,int x= 10; can be split into the following two statements: int x; x=10; (semicolons represent end of a line)Moreover, the idea of mapping the lines of code between two revisions is based on Diff, a tool which was originally meant for comparing files containing English text, not code. All this is done using the Levenshtein edit distance which calculates the difference between the lines using character analysis. In all this, we have missed out considerably on the semantic differences between the lines of code. As noted by Hunt and Tichy [1], conflicts between two versions of the code are generally identified using the notion of collocation, i.e. if two changes occur at the same point in a file. They further inform us with a more exact definition of conflicts based on the notion of semantic interference, i.e. if a change in one revision affects the results of a change in another revision. Semantic interference may not always be collocated.The authors need to redefine their goals for history slicing which is based on syntactic comparison of line of code. This current approach to history slicing is good for informing developer statistics or for a line-by-line approach to code forensics. However, you require an approach more based on a semantic analysis of the code, when it comes to understanding the design rationale or evolution of the code base.The evaluation of this method is based on a “Manual vs Automatic” study. The results of the study however, do not inform us of the differences in the history slices produced by the Manual and Automatic approaches. Did such differences, if present, have anything to do with the time taken? Was there a difference in approach between the two? It is fair to say that developers will use semantic contexts for generating history slices, which is not what the proposed method for building history graphs does?The paper opens a novel concept for future study and exploration. However, it needs to revise the problems that it wants to solve using the approach presented. Also, the authors might consider automating the calculation and usage of a program slice as a selection criterion for creating a history slice.
[1] J. Hunt and W. Tichy. Extensible Language-Aware Merging. IEEE International Conference on Software Maintenance, 0:511–520, 2002.