The original paper: http://dl.acm.org/citation.cfm?id=827807
My thoughts on the papaer:
This paper discusses the earliest phase of requirements engineering, where the reason for an intended system, or the “why” of the system are the primary focus. The author argues that this is different from the later phases of requirements engineering when the purpose is to understand what the system does and how it should be doing it. The author states that this is as important, if not more, as asking 'what' the system should do or 'how' it should do it. The author then argues about the importance of tool support in this early, yet critical, phase; especially to help us maintain the knowledge that will be gathered at this point, which will guide key future decisions and track the origins of change in the system, among other things. Also, he shows how there isn't enough such support for this early and critical phase in requirements engineering. This is when he introduces the idea of using a tool called i*, which is used for modelling organizational environments, in this early phase.
The central idea in i* is to model the stakeholders' interests, concerns and constraints and how these factors between multiple stakeholders interact with each other. It has the strategic dependency modelling where the actors' dependencies on each other are inspected. And there is the strategic rationale model, where an actor's internal goals, beliefs, tasks and abilities are looked at. One question that I had was regarding the aggregation of these various attributes for a particular kind of actor. It would be interesting to model how these attributes vary between people who the same actor type. For instance, an organization might have many people employed in the role of a coder. The distribution of the coders' goals, beliefs, tasks and abilities across the whole group might provide insight about priorities and importance of certain goals and tasks.
The biggest problem in modelling the intentionality of a stakeholder, which is the central focus of i*, is to get the stakeholders to list out the intentions to begin with. How easy or difficult is that? Finding the intentions behind any system in effect allows the requirements engineer to help stakeholders find the solutions to their own problems. This itself might be easier said than done. In many cases, the stakeholders might arrive at a set of possible alternatives and then simply not know which of them is the optimum choice. Thus at this point, understanding the why's behind an intended system might not be enough. Choosing the right option among various alternatives then simply becomes a matter of trying out a selective few of them after doing some analysis. So my question is that, is the goal of this phase to possibly find the right alternative based on the reasons alone? Or just identifying the reasons enough? Or should we force ourselves to stop at the reasons, leaving more than one alternative solution for later analysis?
The paper raises the importance of forcing the stakeholder to figure out a substantial chunk of the problem, instead of the requirements engineer doing most of the gathering process. While it might not be completely effective, but it might result in higher levels of stability in the requirements and thus eventually the system itself when it is finally built.
How or when do you know to shift over to the later phase of requirements engineering? Or is there a clear shift at all? In other words, is it an iterative process, where you understand some of the reasons, then figure out some of the details of how and what to do with a maximum possible degree of completeness and precision, and then return back to finding the answers to the why questions if they arise, or if they were left unanswered? Might this be done to keep a constant check on if we are doing things for the right reasons?
On a final note, is this kind of analysis not the second half of a goal tree, where the why's and the how's are recognized?
Original Paper at http://dl.acm.org/citation.cfm?id=1134002
This paper presents a technique to reduce a dynamic slice derived out of a dynamic dependence graph of a program, with the goal of reducing the search space (the statements in the slice which need to be inspected) for finding the fault. The assumption the paper makes is that the faulty statement will be contained in the slice. However, later in the evaluations of the technique it does report of situations where that is not the case, making it one of the limitations of the technique proposed. A work around with relevant slicing is however proposed. The reason why this is a limitation is because the pruning is done with the motivation of locating faults, and if the fault itself is missing in the slice, it is not helpful. Nonetheless, it is to be noted that the basic idea of tagging each statement with a confidence value depending on a property such as relation with an incorrect output, or a possible security problem with in the code is still valuable, especially if the idea is to be applied to reduce the search/inspection space for the user. Thus, the primary contribution of the paper is the method of pruning based on a very specific property of the program like a faulty/incorrect output. The fact that this can be used for fault localization, in my opinion is the secondary contribution.
Assuming that the fault exists in the slice, the approach that the paper takes in pruning the statements is by assigning each statement in the slice with a confidence value between 0 and 1, with 1 being highly confident. High confidence indicates the statement is not responsible for the incorrect output and thus can be removed or 'pruned' from the dynamic slice with 'confidence'. The paper shows that removing just statements with a confidence value of 1, results in a reduction which ranges from 1.79 to 190.57 times of the number of distinct statements in the regular slice. The paper also reports the effects of having a threshold on the confidence values for inclusion in the pruned dynamic slice (PDS). With decreasing the threshold value the size of the slice decreases, but so does the effectiveness of localizing faults with that slice.
The confidence values are evaluated in the following manner. Each statement is assigned a confidence of 1 if it is involved in computing at least one of the correct output values, but is not involved in calculating any of the incorrect values. A statement is assigned 0, if it is involved in the computation of only incorrect values and none of the positive values. The ambiguity comes in the case of statements which are responsible for the computation of both correct and incorrect outputs. For these statements the following intuition is applied. If a given statement computes any value, which when changed in any manner results in changing the final output from correct to incorrect, then we know that the original value that was produced at that statement was correct. This shows that there is a direct correlation between the output and values produced by that statement. Had the output remained correct , i.e. unaffected despite changing the values produced at a statement, then it is reasonable to say that the there is no or little correlation between the values produced at that statement and final set of correct outputs. Thus, not allowing us to say with confidence that the values being produced at that statement are indeed correct. This idea is used to compute those possible number of values produced at a given statement which when changed do not effect the final correct outputs being produced, making them incorrect. In the case of a high confidence statement (confidence value = 1) such values do not exist. In the case zero confidence statements, all the values produced at that statement have no impact on the final correct outputs being produced. The mathematical relation in the paper depends on such intuition. It is important to note here that the confidence value indicates the confidence we have in the fact that a value produced at a statement is correct not if that value is incorrect. It is not apparent if low confidence indicates that incorrect values are being produced at a statement.
This intuition draws my attention to the fact that the technique requires the program to produce an output. This I feel is a major limitation on the part of the process of pruning itself. The way the technique is described, if the output is not produced, the concept of a correct/incorrect output is non-existent and one cannot assign confidence values to the statements. The other big problem with this kind of reasoning is that if the very first output produced is incorrect then the approach is inapplicable, and the paper notes this in its experimental section. On a different note, the paper states that this technique is limited to only one error. However, I am not quite certain if this is a limitation. Digiuseppe and Jones1 show in their study of fault interactions, how a single fault tends to obfuscate other faults. So even if there are multiple faults, chances are that this or any technique for fault localization will end up detecting only one fault at a time.
The paper also talks about using the confidence values to rank statements in a dynamic slice and presenting it for user inspection. The only problem with providing users with a such a ranked list of source code statements is that users will be looking at those lines without any context, because these lines are probably not directly related with each other especially in the ranked order. Without looking into the source code, they will have no way of knowing if one of the lines in that list is the fault or not. And this problem will only compound if the user has little or no idea about the source code.
Overall, I found this technique to be sound in its basis for reducing a search space in general, but limited when it comes to applying it for locating faults.
Original paper at http://dl.acm.org/citation.cfm?id=996855
This paper talks about a set of optimizations that can be applied while constructing dynamic dependence graphs for programs, thereby reducing the size and the traversal times of these graphs. Since these aspects of a dynamic dependence graph incur costs on computing slices, reducing them makes the process faster. The paper makes a very basic assumption about the basic blocks of a program and their associated 'time stamps' in the context of computing the complete dynamic dependence graph. The assumption is that each basic block will be given a time stamp which will then be associated to all local dependencies occurring in that block. My first doubt at this stage is on the definition of a basic block: is it a function, the body of a loop or individual bodies of a conditional statement? The paper does not clarify this. Assuming that it is the standard definition1, such a time stamp scheme might not be right in the context of concurrent programs, where variables are shared between basic blocks.
On a different note, the paper notes that simply drawing edges between statements of code is not enough, but accurate information should be shown in the form of labels, about all the execution instances which are associated with that dependence being formed. This will also lead to an understanding of the context of the program run. However, it is noted further that the amounts of such labels for each edge will depend on the extent of the program's execution itself. Thus, long running programs are likely to have high amounts of such labels. The optimizations proposed in this paper look at reducing the needs to capture all those labels, and on particular occasions even reducing the edges themselves. The reduction is achieved by a basic principle of edge sharing. Or in other words, if you have a set of edges/labels which will occur together in all or a given set of execution instances then a common edge or a common edge without labels can be drawn which will represent all of those dependencies or execution instances. Here comes the question of loss of information. It is to be noted here that the information being provided by such optimizations are still accurate, but they are abstracted. Given a situation where you want to tag individual execution instances with specific information such as being associated to pass/fail test runs, then you have lost some information with the kind of abstraction that these optimizations introduce. Of course this is a very specific situation, and the paper I assume is looking at more general situations where dynamic slices are required. It would have been better for the paper had certain concrete examples been provided where such an optimized slice might be useful. However, I do understand that the paper is grounded in theory and perhaps such a discussion is out of scope for the paper.
On the subject of the optimizations themselves, in my opinion they are an efficient use of the statically derived information about the program's statement dependencies. The simplest category of optimizations come from statically inferring dependencies which are definite to occur. Edges between statements and unambiguous and sole definitions or control ancestors are instances of such dependencies. These edges need not store label information. The time stamp information is also a compile time constant for these edges as shown in the paper. The next set of optimizations involve transformations of the nodes themselves. When a node has two distinct definition-parents or control-ancestors, then the node is duplicated to serve each parent or ancestor separately. This allows us to statically infer the edges in these transformed nodes. At times, when two uses or control dependent nodes have the same definition-parent or the control-ancestor, then those two nodes are linked together and the dependence edge is drawn only between one of them to the parent or ancestor, thus reducing the number of edges themselves. Furthermore, such transformations can also depend on the path taken between the statements with dependencies. That is different (specialized) nodes can be created for specific paths based on how the dependencies can be identified along that path. The third and final set of transformations are to reduce redundant edges. This is done by merging common common def-use edges between two different variables/objects or common data and control dependent edges into one. This reduces the redundancy as it eliminated the edges. Again, here, a question of information loss comes up. You cannot now, for instance distinguish between control and data dependencies in certain places. And if a situation arises where such a distinction is necessary, then these optimizations won't be helpful.
The evaluations of the optimizations showed the margin by which they reduced the size of the dependence graph and time taken to compute the slices from such an optimized graphs. Furthermore, it showed how each optimization technique contributed to the size reduction. This I felt was the most insightful. It especially shows that OPT1 (optimization 1) which involved inferring local (to a block), data dependences was the most effective at reducing the sizes of the graphs. This is to me is an important result. OPT1 was the most fail safe optimization of all, in that it rejected labels for those edges which are sure to happen no matter what. The other optimizations while valid only if a set of preconditions were met or certain transformations were applied. Also, OPT1 did not go for blatant abstraction which could result in information loss. On the whole, all the optimizations together reduced the graph size by significant margin. And thus the time taken to calculate the slices was obviously far lesser. We should however question the information presented in these slices and their effectiveness. Also, might it be possible to just apply OPT1 even though it would not achieve complete optimization but reduces the graph enough for a fast computation of the slice? It would have been interesting to note the separate times taken for slice computation for each individual optimization.
This was one of the more interesting papers i read last weekend. Here is the link: http://dl.acm.org/citation.cfm?id=1542521. And here is what i understood of it:
This paper addresses the problem of memory leaks with a radically new approach compared to the papers that we have read so far about Memory Leaks. Its approach is data centric instead of code centric as was the case with Sleigh and SWAT. We have read one other data centric approach with Cork, however Cork was addressing a more Java specific issue in the world of memory leaks. This paper presents Hound, which is looking at resolving memory leaks in C and C++. And it does so in the most fundamental manner possible, i.e. by revamping the strategy of allocating or rather organizing memory for objects in the heap. This is a very low level approach, which for the purposes of this paper has been done largely in the Linux environment. The abstract idea presented here seems portable, however the authors would have done well if they had highlighted how easy or difficult would porting such an approach to other environments like Java and Windows be. As a side note, I could not help but wonder how the C language could be referred to, in a paper which discusses objects?
The allocation of memory is done using segregation of objects first on the basis of allocation sites, and then on the basis of object age, which helps in measuring staleness. Segregation on the basis of allocation sites follows the logic that objects allocated in the same site tend to follow the same behavior. Which leads to the notion that a leaking site is the source of leaking objects and hence makes sense to group them together. This is aligned with the ordering logic in SWAT which too uses the number of objects for a given allocation site while reporting leaks to the user. Each allocation site is associated with its own heap. In order to reduce memory overhead, a site is assigned a new heap only after the number of objects in each of its previous heaps reach a threshold value, i.e are full. The object count is kept track of by inserting a hash of the allocation site in each of its objects. The count is incremented upon an allocation and reduced upon memory frees. While objects with the same allocation site exhibit similar behavior, it is often not the case. In order to avoid anomalous cases where hot and old objects are generated from the same allocation site, segregation on the age of the objects is also done. For this, two lists: active and inactive are maintained within the aging queue, which holds all the filled pages. The pages in the active list contain objects in a FIFO order which are accessed more recently and old objects are stored in the inactive list which stores it in the least recently used order. This separation of active and inactive lists is done to reduce the performance overhead which will occur due to increasing number of page faults when objects are accessed from pages in the age queue. Such page faults occur because pages containing old objects are protected. Thus the pages in the active list are left unprotected. Age for objects are calculated using (current time - protection time). When an object is accessed from a page in the inactive list, then the page is shifted to the head of the active list and the age for all the objects in that page is set to 0, and the page is set to be unprotected. My only concern with such an approach where the ages of all the objects are handled together is that, just because one among them was accessed the rest might not necessarily be used again. This shows the importance of limiting the number of objects in a given page.
Now the shifting from inactive to active is done on a need-to-need basis, and hence can be viewed as lazy in approach. However, Hound maintains a target size for the inactive list, which is approached gradually in order to reduce the sudden increase in minor page faults that might result from a large shift of pages from the unprotected active list to the protected inactive list. If the inactive list's size is significantly smaller than the target size, Hound moves as many as 8 pages from the active list to the inactive list. This is far more aggressive than the policy it adopts while shifting pages from the the inactive list. This exhibits the conservative nature of the approach that results in zero false positives, but also leads to an increase in false negatives.
In this particular memory management policy, the objects of a page are not reclaimed till the live count of the page is zero. Thus, even if a single live object is present the page will not be reclaimed. This induces a severe memory overhead, which can be resolved by merging pages with 50% or lesser live count on top of each other in physical memory while keeping them separate in virtual memory. As explained by the paper this is done in a cost effective manner.
Due to the segregation strategies, while the staleness analysis is done at a page level, results are reported at a per allocation site basis. The sites themselves are ranked based on the number of objects from an allocation site and the summation of the drag of all the objects at that site, which is similar to the ordering approach that was taken up in SWAT.
In terms of performance Hound results in slightly higher overheads both for memory and runtime. While is true for most techniques which deal with memory leaks, they are still far more portable than Hound as they address this problem at a higher level of abstraction. Hound on the other hand addresses this problem at a much more low level, and thus one would expect little or no over head of any nature. On a different note, comparing the accuracy of staleness computation with that of SWAT is of little value in my opinion. Like the paper rightly points out, Hound and SWAT have different limitations and both do well within those limitations, as pointed out by the results. It is like comparing apples and oranges: both are fruit, but are vastly different in terms of taste and the nutrition they provide. It would have been better to compare Hound with a more data centric approach such as Cork.
Nonetheless, the comparison gave me an interesting idea. Since, Hound looks to underestimate staleness and SWAT looks at overestimating it, it won't be a bad idea to look at the overlap of the results produced by them. The overlap might give you a 100% accurate listing of the memory leaks. Having said that, between the two, I would choose SWAT, simply because overestimating is better because it might point to certain design flaws in the memory management of the program under test. A conservative approach, like Hound, will never be able to do this.
Yet another review. This is the original paper: http://dl.acm.org/citation.cfm?id=1168866
Generally, this paper looks at the problem of detecting memory leaks in programs and tries to solve the same using a dynamic program analysis approach which is partially rooted in statistics. The paper seems to be interested in memory leaks caused by Useless Objects, in other words, objects which are referenced by in-use components of the program but the objects themselves will no longer be used. This is especially true of the managed languages such as Java and C#. While they resolve the problem of freeing unused objects using Garbage Collection (GC), detecting if an object will ever be used is more difficult. This is so because you are in effect trying to predict how (or if) the object will be used in the future states of the execution. Also, the paper makes one very important assumption about memory leaks: they have no immediate symptoms. This resonates with the idea that predicting the future use of an object is difficult. It is analogous to believing only after seeing it for sometime. You believe that the object is useless from now on, only if you have noticed it not being used over a reasonable period of time.
The paper's primary contribution is is the Bit-Encoding Leak Location or Bell technique. It is an elegant solution for encoding per object related information using a single bit. In this paper it has been used to associate an object's allocation and last use information with a site, i.e. a location with in the source code, using a near zero space overhead. While the technique has been portrayed for use in the context of Leak Detection it can potentially be used for other purposes as well, as it is a general purpose encoding technique that stores object related information. The paper's end goal however, seems to be to pin-point the location or the site of the memory leak, so that a programmer can examine the location and fix the leak itself. This is achieved by the paper's secondary contribution: a tool called Sleigh. Sleigh employs Bell to detect memory leaks.
Bell performs the encoding of an object's site in a single bit. This bit can be one of the free unused bits in the header of the object. Using a simple encoding function, f(object, site), which takes the identifiers of the object and the site, a 0 or 1 are stored in the encoding bit or the site bit since it stores information about the site of the object. An object matches a site if the object's encoding/site bit is the same as that calculated by f(object, site). The encoding bit itself can be stored whenever the object is allocated with memory or is used depending upon the requirement. The encoding function that is selected should have two important properties. Firstly, it should be easy to compute, since it will be computed during runtime thus creating a performance overhead. Secondly, it should be unbiased thus ensuring that an object matches a given site with 50% probability, if it was not encoded with it to begin with. If it was encoded it will always be a match.
Decoding a site means finding out the number of objects that were encoded with the site along with the total objects which match the site. Decoding a site is done for a particular subset of objects. For each of these objects the f(object, site) is computed and is compared with the site bit of the object. If they are the same, then the object and the site match and the match count for the site increases. This is denoted by msite. Hence you find out the number of matches for a given site. If nsite is the number of objects that were encoded with the site, then msite = nsite + 1/2*(n-nsite), where n is the number of objects in question. This is simply because the objects encoded with the site will always match and the remaining objects have a 50% probability of matching the site owning to the 2nd property of the encoding function as mentioned above. From here, one can compute nsite = 2*msite -n.
Such a decoding scheme, for a given site, does not give you the detailed list of objects that were encoded with the site. However, it gives a statistical strength to site i.e. the number of objects encoded at that site. Now, if the common property of all the objects that are associated with the site using the decoding technique is, that they are stale, i.e. have not been used for a very long time and are potential candidates for being useless, then you can say based on the statistical strength of that site which is nothing but the number of stale objects encoded there, that it is a strong source of a memory leak. Which is what has been done using Sleigh. The candidate set of objects are nothing but stale objects. The staleness in effect decided by the staleness bits for every object, which are incremented logarithmically, based on the number of times the GC is called, as explained in the paper. You get a saturation point at 3 as only 2 bits are used for the counter, and that is a state of being highly stale. 0, 1, 2 represent zero, slight and moderate states of staleness. Sleigh maintains 2 bits in the object's header for encoding purposes. One is to encode the object's allocation site, while the other is to encode the object's last use site. The encoding bits are maintained using instrumentation at the application. The decoding unit however, is maintained and run as a separate unit.
The evaluations with SPEC JBB2000 and Eclipse show promise especially due to the fact that the memory leaks were pinpointed very accurately and with a high probability of 99.9% due to the mFalePositive threshold for weeding false positives and nmin, which is the minimum number of objects required to be encoded with a site to be able to predict with the 99.9% that a site is leaking memory. However, there were cases where memory leaks were missed, and furthermore, the testing performed on Sleigh was not extensive. It was based on just two candidate programs, and only specific leaks were identified. That being said, the biggest problem for this kind of a technique using Bell, is a required high value of nmin, for accurately predicting memory leak locations, which can only be attained with reasonable amount of runtime. This is going back to the problem of believing only after seeing it for sometime. This is another reason that large number of candidate programs will not be available for testing this technique, the candidate requires to be running for a long period of time. Which raises an important question: if situations conducive for this technique to work while detecting leaks do not exist aplenty, then is this a valid solution? Nonetheless, the idea of encoding per object information is novel and should be employed in other applications.
As an after thought, a good idea to reduce the dependence on nmin is to present a ranked list, whose ranking is done on the probability of the site being a point of memory leak. While nmin will ensure a accurate result, in shorter running programs, it is probably a better solution to show results i.e. candidate locations with reduced probability but with multiple options.
Another one of my reviews which was my assignment. This is the original paper: http://dl.acm.org/citation.cfm?id=1190224
Cork is a memory leak detecting tool that looks into the detection of leaks created by useless objects in managed programming languages such as Java and C#. The idea here is not to pinpoint the site of the memory leak which like in the case of Sleigh1 was a source code line. Instead Cork looks to identify the Data Structure responsible for growth of the heap and thus the cause of a memory leak.
Cork does this by summarizing the live objects or types in the heap. This is done by piggybacking the Garbage-collector which essentially begins the process of garbage collection by scanning for all the live objects starting from the roots. This piggy backing is done using the type information block or any such analogous entity which is usually required for the implementation of managed programming languages. The TIBs, provide a rich source of type information for objects which are required by the compiler for proper functioning. During the scanning phase, a Type Points-From Graph (TPFG) is created in which, the nodes are the different Types in the heaps. The type nodes are annotated with the volume or number of objects in the heap of that particular type. The edges between the type nodes are Points-From edges, i.e. when an edge is drawn from t2 to t1, it means that objects of type t2 are pointed or referenced from objects of type t1. The edge itself is annotated with a weight which denotes the number of objects of type t2 referenced by type t1.
Such a TPFG is created while the scanning process itself. Each time an object is encountered the corresponding nodes and required edges are updated. This induces a very low time overhead which makes this process an ideal candidate for production environments. When multiple such TPFGs are collected, differences are calculated between them and trends of growths are observed. To be sure of the non-decreasing nature of the growth, a ranking scheme for the node types is used. A ranking threshold is used which is used to display all candidate types which rank higher than the threshold. The ranking scheme used and described by the paper is the Ratio Ranking Technique, which uses the ratios between volumes of tow consecutive TPFGs. Furthermore, the phenomena of jitter or fluctuating volume patterns is accounted for using a decay factor. The idea is to account for types which show an over all growth pattern despite fluctuating volumes.
Thereafter, a slice is constructed through the TPFG. By starting at the candidate type node, all the growing edges are traced until a non growing type node is encountered with non growing type edges. This is also accompanied with a Site Map for all the candidate types which essentially shows the allocation sites in the code base, which can be a combination of method name and source code line number or byte code offset. The slice along with the site map give a clear picture of the potential memory leaks along with their exact location in code.
The evaluation for this technique look good and quite accurate. However, 0only specific examples have been shown. It might have made a sound argument to show an extensive study on a range of candidate programs or at least on a range of candidate memory leak bugs. Nonetheless, the examples shown in the evaluations are compelling, especially due to the low memory and performance overheads. However, this evaluation has not shown one simple situation that comes to mind: what if the candidate program required to have a growing heap size. While such a situation might be rare the paper has failed to show if something like that will give a false positive or not. In other words, what if the requirements were for a growing heap size, where all the objects in the heap would be used. It is important to note that this technique relies on the premise that a growing type/object size necessarily imply a memory leak. Which may not always be true. And this is the source of my question regarding this paper: How can you be sure that a growing heap size is definitely an indicator of an increase in unused objects which are being referenced? In other words, how does this technique ensure/verify that the objects responsible for the growing heap size are indeed not going to be used anymore? Because, unlike the Bell1 technique, there is no mechanism in place to indicate the staleness of the objects.
1. Michael D. Bond and Kathryn S. McKinley. 2006. Bell: bit-encoding online memory leak detection. InProceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII). ACM, New York, NY, USA, 61-72. DOI=10.1145/1168857.1168866 http://doi.acm.org/10.1145/1168857.1168866
This is a recent paper i read on RE; Requirements Engineering in the Year 00: A Research Perspective (http://dl.acm.org/citation.cfm?id=337184).
And this was my response:
Overall while reading this paper, one could feel a sense of pity that we (software folk) allowed programming paradigms to drive the paradigms of RE. I am talking of Object Oriented techniques here. When clearly, it should have been the other way around. Goal based approaches including responsibility assignment and goal dependencies seem to be a far more natural way of doing things rather than relying on scenarios. And when grounded in formal methods such as AND/OR relationships they form the bridge to the implementation landscape thus driving software architectures and eventually programing.
An extract from a recent assignment that I had to do in my infromation retrival class:
Dr. Bush I believe was man of great understanding, vision, and balance. To be able to foresee the importance of technology and sciences in ensuring peace was truly remarkable. He was quick to change from a war time science “general”, to a peace time advocate of growth. This was the same man who advised President Truman on the early bombing of Japan, who later turned his attention towards establishing a scientific outfit for progress or what we know today as the National Science Foundation, which is responsible for creating some of the best works in nearly most spheres of science and technology known to man, which are mostly used for peaceful purposes. His work in the techniques of managing digital media in order to help mankind understand itself better and thus grow at a quick pace was another instance which showed that he was not a war hungry scientist but much more. He was a scientist first beyond anything else, who analyzed the situation at hand and acted accordingly. He ensured the security of his nation during war with something like the Manhattan Project and equally well during peace with the sowing seeds of NSF.
So this has been my latest homework that got me to my wits' end:
Give a presentation on the following paper: http://dl.acm.org/citation.cfm?id=1190268
And this was my presentation: Click here. (go to the link and do a ctrl+s)
Hope it helps you understand it quicker than it helped me.
History Slicing - Francisco Servant, James A. Jones. Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Short paper track, Lawrence, Kansas, USA, November 2011, pp. 452–455.This paper puts forth the concept of history slicing, a method which traces a part of the history of a subset of the code-base, which is of interest to the developer. This subset of history is called a history slice, which can then be used to understand the evolution of specific program features and design rationale, to carry out code forensics and for generating code based developer statistics.This is done by building a history graph. Then, based on some form of selection criterion using which the lines to be investigated are chosen and the graph is traversed and the resulting slice is presented in an appropriate manner. This paper provides us with this bare minimum base upon which different methods for the steps described, can be used. The paper at every stage starts with a concept explains an adaptation for implementation and moves on to the next concept, thus giving a complete overview of the idea without dwelling deep into a specific direction, full of specific details. Many might view this as a folly, I do not. This in my opinion is a good basic start to novel subject.However, the method that the authors select for building such a graph, in my view, is too simplistic. Each node in the graph, which represents a line of code, is allowed to be connected only to one node in the previous revision of the code and/or to one in the next revision of the code. This view seemingly ignores the idea of a single line of code being written or broken down to more than one line or vice versa. E.g.,int x= 10; can be split into the following two statements: int x; x=10; (semicolons represent end of a line)Moreover, the idea of mapping the lines of code between two revisions is based on Diff, a tool which was originally meant for comparing files containing English text, not code. All this is done using the Levenshtein edit distance which calculates the difference between the lines using character analysis. In all this, we have missed out considerably on the semantic differences between the lines of code. As noted by Hunt and Tichy [1], conflicts between two versions of the code are generally identified using the notion of collocation, i.e. if two changes occur at the same point in a file. They further inform us with a more exact definition of conflicts based on the notion of semantic interference, i.e. if a change in one revision affects the results of a change in another revision. Semantic interference may not always be collocated.The authors need to redefine their goals for history slicing which is based on syntactic comparison of line of code. This current approach to history slicing is good for informing developer statistics or for a line-by-line approach to code forensics. However, you require an approach more based on a semantic analysis of the code, when it comes to understanding the design rationale or evolution of the code base.The evaluation of this method is based on a “Manual vs Automatic” study. The results of the study however, do not inform us of the differences in the history slices produced by the Manual and Automatic approaches. Did such differences, if present, have anything to do with the time taken? Was there a difference in approach between the two? It is fair to say that developers will use semantic contexts for generating history slices, which is not what the proposed method for building history graphs does?The paper opens a novel concept for future study and exploration. However, it needs to revise the problems that it wants to solve using the approach presented. Also, the authors might consider automating the calculation and usage of a program slice as a selection criterion for creating a history slice.
[1] J. Hunt and W. Tichy. Extensible Language-Aware Merging. IEEE International Conference on Software Maintenance, 0:511–520, 2002.
|