Vijay Krishna's Notes http://vijaykrishna.posterous.com Most of my notes as a student of computer software and everything around it. posterous.com Fri, 10 Feb 2012 16:08:00 -0800 Thoughts on "Towards Modelling and Reasoning Support for Early-Phase Requirements Engineering – Eric S. K. Yu" http://vijaykrishna.posterous.com/thoughts-on-towards-modelling-and-reasoning-s http://vijaykrishna.posterous.com/thoughts-on-towards-modelling-and-reasoning-s

The original paper: http://dl.acm.org/citation.cfm?id=827807

My thoughts on the papaer:

This paper discusses the earliest phase of requirements engineering, where the reason for an intended system, or the “why” of the system are the primary focus. The author argues that this is different from the later phases of requirements engineering when the purpose is to understand what the system does and how it should be doing it. The author states that this is as important, if not more, as asking 'what' the system should do or 'how' it should do it. The author then argues about the importance of tool support in this early, yet critical, phase; especially to help us maintain the knowledge that will be gathered at this point, which will guide key future decisions and track the origins of change in the system, among other things. Also, he shows how there isn't enough such support for this early and critical phase in requirements engineering. This is when he introduces the idea of using a tool called i*, which is used for modelling organizational environments, in this early phase.

The central idea in i* is to model the stakeholders' interests, concerns and constraints and how these factors between multiple stakeholders interact with each other. It has the strategic dependency modelling where the actors' dependencies on each other are inspected. And there is the strategic rationale model, where an actor's internal goals, beliefs, tasks and abilities are looked at. One question that I had was regarding the aggregation of these various attributes for a particular kind of actor. It would be interesting to model how these attributes vary between people who the same actor type. For instance, an organization might have many people employed in the role of a coder. The distribution of the coders' goals, beliefs, tasks and abilities across the whole group might provide insight about priorities and importance of certain goals and tasks.

The biggest problem in modelling the intentionality of a stakeholder, which is the central focus of i*, is to get the stakeholders to list out the intentions to begin with. How easy or difficult is that? Finding the intentions behind any system in effect allows the requirements engineer to help stakeholders find the solutions to their own problems. This itself might be easier said than done. In many cases, the stakeholders might arrive at a set of possible alternatives and then simply not know which of them is the optimum choice. Thus at this point, understanding the why's behind an intended system might not be enough. Choosing the right option among various alternatives then simply becomes a matter of trying out a selective few of them after doing some analysis. So my question is that, is the goal of this phase to possibly find the right alternative based on the reasons alone? Or just identifying the reasons enough? Or should we force ourselves to stop at the reasons, leaving more than one alternative solution for later analysis?

The paper raises the importance of forcing the stakeholder to figure out a substantial chunk of the problem, instead of the requirements engineer doing most of the gathering process. While it might not be completely effective, but it might result in higher levels of stability in the requirements and thus eventually the system itself when it is finally built.

How or when do you know to shift over to the later phase of requirements engineering? Or is there a clear shift at all? In other words, is it an iterative process, where you understand some of the reasons, then figure out some of the details of how and what to do with a maximum possible degree of completeness and precision, and then return back to finding the answers to the why questions if they arise, or if they were left unanswered? Might this be done to keep a constant check on if we are doing things for the right reasons?

On a final note, is this kind of analysis not the second half of a goal tree, where the why's and the how's are recognized?

 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 08 Feb 2012 17:44:00 -0800 Review on Pruning Dynamic Slices with Confidence http://vijaykrishna.posterous.com/review-on-pruning-dynamic-slices-with-confide http://vijaykrishna.posterous.com/review-on-pruning-dynamic-slices-with-confide

Original Paper at http://dl.acm.org/citation.cfm?id=1134002

This paper presents a technique to reduce a dynamic slice derived out of a dynamic dependence graph of a program, with the goal of reducing the search space (the statements in the slice which need to be inspected) for finding the fault. The assumption the paper makes is that the faulty statement will be contained in the slice. However, later in the evaluations of the technique it does report of situations where that is not the case, making it one of the limitations of the technique proposed. A work around with relevant slicing is however proposed. The reason why this is a limitation is because the pruning is done with the motivation of locating faults, and if the fault itself is missing in the slice, it is not helpful. Nonetheless, it is to be noted that the basic idea of tagging each statement with a confidence value depending on a property such as relation with an incorrect output, or a possible security problem with in the code is still valuable, especially if the idea is to be applied to reduce the search/inspection space for the user. Thus, the primary contribution of the paper is the method of pruning based on a very specific property of the program like a faulty/incorrect output. The fact that this can be used for fault localization, in my opinion is the secondary contribution.

Assuming that the fault exists in the slice, the approach that the paper takes in pruning the statements is by assigning each statement in the slice with a confidence value between 0 and 1, with 1 being highly confident. High confidence indicates the statement is not responsible for the incorrect output and thus can be removed or 'pruned' from the dynamic slice with 'confidence'. The paper shows that removing just statements with a confidence value of 1, results in a reduction which ranges from 1.79 to 190.57 times of the number of distinct statements in the regular slice. The paper also reports the effects of having a threshold on the confidence values for inclusion in the pruned dynamic slice (PDS). With decreasing the threshold value the size of the slice decreases, but so does the effectiveness of localizing faults with that slice.

The confidence values are evaluated in the following manner. Each statement is assigned a confidence of 1 if it is involved in computing at least one of the correct output values, but is not involved in calculating any of the incorrect values. A statement is assigned 0, if it is involved in the computation of only incorrect values and none of the positive values. The ambiguity comes in the case of statements which are responsible for the computation of both correct and incorrect outputs. For these statements the following intuition is applied. If a given statement computes any value, which when changed in any manner results in changing the final output from correct to incorrect, then we know that the original value that was produced at that statement was correct. This shows that there is a direct correlation between the output and values produced by that statement. Had the output remained correct , i.e. unaffected despite changing the values produced at a statement, then it is reasonable to say that the there is no or little correlation between the values produced at that statement and final set of correct outputs. Thus, not allowing us to say with confidence that the values being produced at that statement are indeed correct. This idea is used to compute those possible number of values produced at a given statement which when changed do not effect the final correct outputs being produced, making them incorrect. In the case of a high confidence statement (confidence value = 1) such values do not exist. In the case zero confidence statements, all the values produced at that statement have no impact on the final correct outputs being produced. The mathematical relation in the paper depends on such intuition. It is important to note here that the confidence value indicates the confidence we have in the fact that a value produced at a statement is correct not if that value is incorrect. It is not apparent if low confidence indicates that incorrect values are being produced at a statement.

This intuition draws my attention to the fact that the technique requires the program to produce an output. This I feel is a major limitation on the part of the process of pruning itself. The way the technique is described, if the output is not produced, the concept of a correct/incorrect output is non-existent and one cannot assign confidence values to the statements. The other big problem with this kind of reasoning is that if the very first output produced is incorrect then the approach is inapplicable, and the paper notes this in its experimental section. On a different note, the paper states that this technique is limited to only one error. However, I am not quite certain if this is a limitation. Digiuseppe and Jones1 show in their study of fault interactions, how a single fault tends to obfuscate other faults. So even if there are multiple faults, chances are that this or any technique for fault localization will end up detecting only one fault at a time.

The paper also talks about using the confidence values to rank statements in a dynamic slice and presenting it for user inspection. The only problem with providing users with a such a ranked list of source code statements is that users will be looking at those lines without any context, because these lines are probably not directly related with each other especially in the ranked order. Without looking into the source code, they will have no way of knowing if one of the lines in that list is the fault or not. And this problem will only compound if the user has little or no idea about the source code.

Overall, I found this technique to be sound in its basis for reducing a search space in general, but limited when it comes to applying it for locating faults.

1 N. Digiuseppe and J. A. Jones Fault Interaction and its Repercussions, Proceedings of the 27th IEEE International Conference on Software Maintenance (ICSM). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6080767&tag=1

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 08 Feb 2012 17:36:00 -0800 Review on Cost Effective Dynamic Program Slicing http://vijaykrishna.posterous.com/review-on-cost-effective-dynamic-program-slic http://vijaykrishna.posterous.com/review-on-cost-effective-dynamic-program-slic

Original paper at http://dl.acm.org/citation.cfm?id=996855

This paper talks about a set of optimizations that can be applied while constructing dynamic dependence graphs for programs, thereby reducing the size and the traversal times of these graphs. Since these aspects of a dynamic dependence graph incur costs on computing slices, reducing them makes the process faster. The paper makes a very basic assumption about the basic blocks of a program and their associated 'time stamps' in the context of computing the complete dynamic dependence graph. The assumption is that each basic block will be given a time stamp which will then be associated to all local dependencies occurring in that block. My first doubt at this stage is on the definition of a basic block: is it a function, the body of a loop or individual bodies of a conditional statement? The paper does not clarify this. Assuming that it is the standard definition1, such a time stamp scheme might not be right in the context of concurrent programs, where variables are shared between basic blocks.

On a different note, the paper notes that simply drawing edges between statements of code is not enough, but accurate information should be shown in the form of labels, about all the execution instances which are associated with that dependence being formed. This will also lead to an understanding of the context of the program run. However, it is noted further that the amounts of such labels for each edge will depend on the extent of the program's execution itself. Thus, long running programs are likely to have high amounts of such labels. The optimizations proposed in this paper look at reducing the needs to capture all those labels, and on particular occasions even reducing the edges themselves. The reduction is achieved by a basic principle of edge sharing. Or in other words, if you have a set of edges/labels which will occur together in all or a given set of execution instances then a common edge or a common edge without labels can be drawn which will represent all of those dependencies or execution instances. Here comes the question of loss of information. It is to be noted here that the information being provided by such optimizations are still accurate, but they are abstracted. Given a situation where you want to tag individual execution instances with specific information such as being associated to pass/fail test runs, then you have lost some information with the kind of abstraction that these optimizations introduce. Of course this is a very specific situation, and the paper I assume is looking at more general situations where dynamic slices are required. It would have been better for the paper had certain concrete examples been provided where such an optimized slice might be useful. However, I do understand that the paper is grounded in theory and perhaps such a discussion is out of scope for the paper.

On the subject of the optimizations themselves, in my opinion they are an efficient use of the statically derived information about the program's statement dependencies. The simplest category of optimizations come from statically inferring dependencies which are definite to occur. Edges between statements and unambiguous and sole definitions or control ancestors are instances of such dependencies. These edges need not store label information. The time stamp information is also a compile time constant for these edges as shown in the paper. The next set of optimizations involve transformations of the nodes themselves. When a node has two distinct definition-parents or control-ancestors, then the node is duplicated to serve each parent or ancestor separately. This allows us to statically infer the edges in these transformed nodes. At times, when two uses or control dependent nodes have the same definition-parent or the control-ancestor, then those two nodes are linked together and the dependence edge is drawn only between one of them to the parent or ancestor, thus reducing the number of edges themselves. Furthermore, such transformations can also depend on the path taken between the statements with dependencies. That is different (specialized) nodes can be created for specific paths based on how the dependencies can be identified along that path. The third and final set of transformations are to reduce redundant edges. This is done by merging common common def-use edges between two different variables/objects or common data and control dependent edges into one. This reduces the redundancy as it eliminated the edges. Again, here, a question of information loss comes up. You cannot now, for instance distinguish between control and data dependencies in certain places. And if a situation arises where such a distinction is necessary, then these optimizations won't be helpful.

The evaluations of the optimizations showed the margin by which they reduced the size of the dependence graph and time taken to compute the slices from such an optimized graphs. Furthermore, it showed how each optimization technique contributed to the size reduction. This I felt was the most insightful. It especially shows that OPT1 (optimization 1) which involved inferring local (to a block), data dependences was the most effective at reducing the sizes of the graphs. This is to me is an important result. OPT1 was the most fail safe optimization of all, in that it rejected labels for those edges which are sure to happen no matter what. The other optimizations while valid only if a set of preconditions were met or certain transformations were applied. Also, OPT1 did not go for blatant abstraction which could result in information loss. On the whole, all the optimizations together reduced the graph size by significant margin. And thus the time taken to calculate the slices was obviously far lesser. We should however question the information presented in these slices and their effectiveness. Also, might it be possible to just apply OPT1 even though it would not achieve complete optimization but reduces the graph enough for a fast computation of the slice? It would have been interesting to note the separate times taken for slice computation for each individual optimization.

1 Frances E. Allen. 1970. Control flow analysis. SIGPLAN Not. 5, 7 (July 1970), 1-19. DOI=10.1145/390013.808479 http://doi.acm.org/10.1145/390013.808479

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 25 Jan 2012 11:48:00 -0800 Beware of the Hound ... woof! http://vijaykrishna.posterous.com/beware-of-the-hound-woof http://vijaykrishna.posterous.com/beware-of-the-hound-woof

This was one of the more interesting papers i read last weekend. Here is the link: http://dl.acm.org/citation.cfm?id=1542521. And here is what i understood of it:

This paper addresses the problem of memory leaks with a radically new approach compared to the papers that we have read so far about Memory Leaks. Its approach is data centric instead of code centric as was the case with Sleigh and SWAT. We have read one other data centric approach with Cork, however Cork was addressing a more Java specific issue in the world of memory leaks. This paper presents Hound, which is looking at resolving memory leaks in C and C++. And it does so in the most fundamental manner possible, i.e. by revamping the strategy of allocating or rather organizing memory for objects in the heap. This is a very low level approach, which for the purposes of this paper has been done largely in the Linux environment. The abstract idea presented here seems portable, however the authors would have done well if they had highlighted how easy or difficult would porting such an approach to other environments like Java and Windows be. As a side note, I could not help but wonder how the C language could be referred to, in a paper which discusses objects?

The allocation of memory is done using segregation of objects first on the basis of allocation sites, and then on the basis of object age, which helps in measuring staleness. Segregation on the basis of allocation sites follows the logic that objects allocated in the same site tend to follow the same behavior. Which leads to the notion that a leaking site is the source of leaking objects and hence makes sense to group them together. This is aligned with the ordering logic in SWAT which too uses the number of objects for a given allocation site while reporting leaks to the user. Each allocation site is associated with its own heap. In order to reduce memory overhead, a site is assigned a new heap only after the number of objects in each of its previous heaps reach a threshold value, i.e are full. The object count is kept track of by inserting a hash of the allocation site in each of its objects. The count is incremented upon an allocation and reduced upon memory frees. While objects with the same allocation site exhibit similar behavior, it is often not the case. In order to avoid anomalous cases where hot and old objects are generated from the same allocation site, segregation on the age of the objects is also done. For this, two lists: active and inactive are maintained within the aging queue, which holds all the filled pages. The pages in the active list contain objects in a FIFO order which are accessed more recently and old objects are stored in the inactive list which stores it in the least recently used order. This separation of active and inactive lists is done to reduce the performance overhead which will occur due to increasing number of page faults when objects are accessed from pages in the age queue. Such page faults occur because pages containing old objects are protected. Thus the pages in the active list are left unprotected. Age for objects are calculated using (current time - protection time). When an object is accessed from a page in the inactive list, then the page is shifted to the head of the active list and the age for all the objects in that page is set to 0, and the page is set to be unprotected. My only concern with such an approach where the ages of all the objects are handled together is that, just because one among them was accessed the rest might not necessarily be used again. This shows the importance of limiting the number of objects in a given page.

Now the shifting from inactive to active is done on a need-to-need basis, and hence can be viewed as lazy in approach. However, Hound maintains a target size for the inactive list, which is approached gradually in order to reduce the sudden increase in minor page faults that might result from a large shift of pages from the unprotected active list to the protected inactive list. If the inactive list's size is significantly smaller than the target size, Hound moves as many as 8 pages from the active list to the inactive list. This is far more aggressive than the policy it adopts while shifting pages from the the inactive list. This exhibits the conservative nature of the approach that results in zero false positives, but also leads to an increase in false negatives.

In this particular memory management policy, the objects of a page are not reclaimed till the live count of the page is zero. Thus, even if a single live object is present the page will not be reclaimed. This induces a severe memory overhead, which can be resolved by merging pages with 50% or lesser live count on top of each other in physical memory while keeping them separate in virtual memory. As explained by the paper this is done in a cost effective manner.

Due to the segregation strategies, while the staleness analysis is done at a page level, results are reported at a per allocation site basis. The sites themselves are ranked based on the number of objects from an allocation site and the summation of the drag of all the objects at that site, which is similar to the ordering approach that was taken up in SWAT.

In terms of performance Hound results in slightly higher overheads both for memory and runtime. While is true for most techniques which deal with memory leaks, they are still far more portable than Hound as they address this problem at a higher level of abstraction. Hound on the other hand addresses this problem at a much more low level, and thus one would expect little or no over head of any nature. On a different note, comparing the accuracy of staleness computation with that of SWAT is of little value in my opinion. Like the paper rightly points out, Hound and SWAT have different limitations and both do well within those limitations, as pointed out by the results. It is like comparing apples and oranges: both are fruit, but are vastly different in terms of taste and the nutrition they provide. It would have been better to compare Hound with a more data centric approach such as Cork.

Nonetheless, the comparison gave me an interesting idea. Since, Hound looks to underestimate staleness and SWAT looks at overestimating it, it won't be a bad idea to look at the overlap of the results produced by them. The overlap might give you a 100% accurate listing of the memory leaks. Having said that, between the two, I would choose SWAT, simply because overestimating is better because it might point to certain design flaws in the memory management of the program under test. A conservative approach, like Hound, will never be able to do this.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Thu, 19 Jan 2012 00:48:00 -0800 Review on "Bell: Bit Encoding Online Memory Leak Detection" http://vijaykrishna.posterous.com/review-on-bell-bit-encoding-online-memory-lea http://vijaykrishna.posterous.com/review-on-bell-bit-encoding-online-memory-lea

Yet another review. This is the original paper: http://dl.acm.org/citation.cfm?id=1168866

Generally, this paper looks at the problem of detecting memory leaks in programs and tries to solve the same using a dynamic program analysis approach which is partially rooted in statistics. The paper seems to be interested in memory leaks caused by Useless Objects, in other words, objects which are referenced by in-use components of the program but the objects themselves will no longer be used. This is especially true of the managed languages such as Java and C#. While they resolve the problem of freeing unused objects using Garbage Collection (GC), detecting if an object will ever be used is more difficult. This is so because you are in effect trying to predict how (or if) the object will be used in the future states of the execution. Also, the paper makes one very important assumption about memory leaks: they have no immediate symptoms. This resonates with the idea that predicting the future use of an object is difficult. It is analogous to believing only after seeing it for sometime. You believe that the object is useless from now on, only if you have noticed it not being used over a reasonable period of time.

The paper's primary contribution is is the Bit-Encoding Leak Location or Bell technique. It is an elegant solution for encoding per object related information using a single bit. In this paper it has been used to associate an object's allocation and last use information with a site, i.e. a location with in the source code, using a near zero space overhead. While the technique has been portrayed for use in the context of Leak Detection it can potentially be used for other purposes as well, as it is a general purpose encoding technique that stores object related information. The paper's end goal however, seems to be to pin-point the location or the site of the memory leak, so that a programmer can examine the location and fix the leak itself. This is achieved by the paper's secondary contribution: a tool called Sleigh. Sleigh employs Bell to detect memory leaks.

Bell performs the encoding of an object's site in a single bit. This bit can be one of the free unused bits in the header of the object. Using a simple encoding function, f(object, site), which takes the identifiers of the object and the site, a 0 or 1 are stored in the encoding bit or the site bit since it stores information about the site of the object. An object matches a site if the object's encoding/site bit is the same as that calculated by f(object, site). The encoding bit itself can be stored whenever the object is allocated with memory or is used depending upon the requirement. The encoding function that is selected should have two important properties. Firstly, it should be easy to compute, since it will be computed during runtime thus creating a performance overhead. Secondly, it should be unbiased thus ensuring that an object matches a given site with 50% probability, if it was not encoded with it to begin with. If it was encoded it will always be a match.

Decoding a site means finding out the number of objects that were encoded with the site along with the total objects which match the site. Decoding a site is done for a particular subset of objects. For each of these objects the f(object, site) is computed and is compared with the site bit of the object. If they are the same, then the object and the site match and the match count for the site increases. This is denoted by msite. Hence you find out the number of matches for a given site. If nsite is the number of objects that were encoded with the site, then msite = nsite + 1/2*(n-nsite), where n is the number of objects in question. This is simply because the objects encoded with the site will always match and the remaining objects have a 50% probability of matching the site owning to the 2nd property of the encoding function as mentioned above. From here, one can compute nsite = 2*msite -n.

Such a decoding scheme, for a given site, does not give you the detailed list of objects that were encoded with the site. However, it gives a statistical strength to site i.e. the number of objects encoded at that site. Now, if the common property of all the objects that are associated with the site using the decoding technique is, that they are stale, i.e. have not been used for a very long time and are potential candidates for being useless, then you can say based on the statistical strength of that site which is nothing but the number of stale objects encoded there, that it is a strong source of a memory leak. Which is what has been done using Sleigh. The candidate set of objects are nothing but stale objects. The staleness in effect decided by the staleness bits for every object, which are incremented logarithmically, based on the number of times the GC is called, as explained in the paper. You get a saturation point at 3 as only 2 bits are used for the counter, and that is a state of being highly stale. 0, 1, 2 represent zero, slight and moderate states of staleness. Sleigh maintains 2 bits in the object's header for encoding purposes. One is to encode the object's allocation site, while the other is to encode the object's last use site. The encoding bits are maintained using instrumentation at the application. The decoding unit however, is maintained and run as a separate unit.

The evaluations with SPEC JBB2000 and Eclipse show promise especially due to the fact that the memory leaks were pinpointed very accurately and with a high probability of 99.9% due to the mFalePositive threshold for weeding false positives and nmin, which is the minimum number of objects required to be encoded with a site to be able to predict with the 99.9% that a site is leaking memory. However, there were cases where memory leaks were missed, and furthermore, the testing performed on Sleigh was not extensive. It was based on just two candidate programs, and only specific leaks were identified. That being said, the biggest problem for this kind of a technique using Bell, is a required high value of nmin, for accurately predicting memory leak locations, which can only be attained with reasonable amount of runtime. This is going back to the problem of believing only after seeing it for sometime. This is another reason that large number of candidate programs will not be available for testing this technique, the candidate requires to be running for a long period of time. Which raises an important question: if situations conducive for this technique to work while detecting leaks do not exist aplenty, then is this a valid solution? Nonetheless, the idea of encoding per object information is novel and should be employed in other applications.

As an after thought, a good idea to reduce the dependence on nmin is to present a ranked list, whose ranking is done on the probability of the site being a point of memory leak. While nmin will ensure a accurate result, in shorter running programs, it is probably a better solution to show results i.e. candidate locations with reduced probability but with multiple options.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Thu, 19 Jan 2012 00:37:00 -0800 Review on "Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages" http://vijaykrishna.posterous.com/review-on-cork-dynamic-memory-leak-detection http://vijaykrishna.posterous.com/review-on-cork-dynamic-memory-leak-detection

Another one of my reviews which was my assignment. This is the original paper: http://dl.acm.org/citation.cfm?id=1190224

Cork is a memory leak detecting tool that looks into the detection of leaks created by useless objects in managed programming languages such as Java and C#. The idea here is not to pinpoint the site of the memory leak which like in the case of Sleigh1 was a source code line. Instead Cork looks to identify the Data Structure responsible for growth of the heap and thus the cause of a memory leak.

Cork does this by summarizing the live objects or types in the heap. This is done by piggybacking the Garbage-collector which essentially begins the process of garbage collection by scanning for all the live objects starting from the roots. This piggy backing is done using the type information block or any such analogous entity which is usually required for the implementation of managed programming languages. The TIBs, provide a rich source of type information for objects which are required by the compiler for proper functioning. During the scanning phase, a Type Points-From Graph (TPFG) is created in which, the nodes are the different Types in the heaps. The type nodes are annotated with the volume or number of objects in the heap of that particular type. The edges between the type nodes are Points-From edges, i.e. when an edge is drawn from t2 to t1, it means that objects of type t2 are pointed or referenced from objects of type t1. The edge itself is annotated with a weight which denotes the number of objects of type t2 referenced by type t1.

Such a TPFG is created while the scanning process itself. Each time an object is encountered the corresponding nodes and required edges are updated. This induces a very low time overhead which makes this process an ideal candidate for production environments. When multiple such TPFGs are collected, differences are calculated between them and trends of growths are observed. To be sure of the non-decreasing nature of the growth, a ranking scheme for the node types is used. A ranking threshold is used which is used to display all candidate types which rank higher than the threshold. The ranking scheme used and described by the paper is the Ratio Ranking Technique, which uses the ratios between volumes of tow consecutive TPFGs. Furthermore, the phenomena of jitter or fluctuating volume patterns is accounted for using a decay factor. The idea is to account for types which show an over all growth pattern despite fluctuating volumes.

Thereafter, a slice is constructed through the TPFG. By starting at the candidate type node, all the growing edges are traced until a non growing type node is encountered with non growing type edges. This is also accompanied with a Site Map for all the candidate types which essentially shows the allocation sites in the code base, which can be a combination of method name and source code line number or byte code offset. The slice along with the site map give a clear picture of the potential memory leaks along with their exact location in code.

The evaluation for this technique look good and quite accurate. However, 0only specific examples have been shown. It might have made a sound argument to show an extensive study on a range of candidate programs or at least on a range of candidate memory leak bugs. Nonetheless, the examples shown in the evaluations are compelling, especially due to the low memory and performance overheads. However, this evaluation has not shown one simple situation that comes to mind: what if the candidate program required to have a growing heap size. While such a situation might be rare the paper has failed to show if something like that will give a false positive or not. In other words, what if the requirements were for a growing heap size, where all the objects in the heap would be used. It is important to note that this technique relies on the premise that a growing type/object size necessarily imply a memory leak. Which may not always be true. And this is the source of my question regarding this paper: How can you be sure that a growing heap size is definitely an indicator of an increase in unused objects which are being referenced? In other words, how does this technique ensure/verify that the objects responsible for the growing heap size are indeed not going to be used anymore? Because, unlike the Bell1 technique, there is no mechanism in place to indicate the staleness of the objects.

1. Michael D. Bond and Kathryn S. McKinley. 2006. Bell: bit-encoding online memory leak detection. InProceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS-XII). ACM, New York, NY, USA, 61-72. DOI=10.1145/1168857.1168866 http://doi.acm.org/10.1145/1168857.1168866

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Sat, 14 Jan 2012 18:27:00 -0800 Goal based RE! http://vijaykrishna.posterous.com/goal-based-re http://vijaykrishna.posterous.com/goal-based-re

This is a recent paper i read on RE; Requirements Engineering in the Year 00: A Research Perspective (http://dl.acm.org/citation.cfm?id=337184).

And this was my response:

Overall while reading this paper, one could feel a sense of pity that we (software folk) allowed programming paradigms to drive the paradigms of RE. I am talking of Object Oriented techniques here. When clearly, it should have been the other way around. Goal based approaches including responsibility assignment and goal dependencies seem to be a far more natural way of doing things rather than relying on scenarios. And when grounded in formal methods such as AND/OR relationships they form the bridge to the implementation landscape thus driving software architectures and eventually programing.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Sat, 14 Jan 2012 18:10:00 -0800 Vannevar Bush http://vijaykrishna.posterous.com/vannevar-bush http://vijaykrishna.posterous.com/vannevar-bush

An extract from a recent assignment that I had to do in my infromation retrival class:

Dr. Bush I believe was man of great understanding, vision, and balance. To be able to foresee the importance of technology and sciences in ensuring peace was truly remarkable. He was quick to change from a war time science “general”, to a peace time advocate of growth. This was the same man who advised President Truman on the early bombing of Japan, who later turned his attention towards establishing a scientific outfit for progress or what we know today as the National Science Foundation, which is responsible for creating some of the best works in nearly most spheres of science and technology known to man, which are mostly used for peaceful purposes. His work in the techniques of managing digital media in order to help mankind understand itself better and thus grow at a quick pace was another instance which showed that he was not a war hungry scientist but much more. He was a scientist first beyond anything else, who analyzed the situation at hand and acted accordingly. He ensured the security of his nation during war with something like the Manhattan Project and equally well during peace with the sowing seeds of NSF.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Sat, 14 Jan 2012 18:04:00 -0800 Explaining the PPP paper in a simple manner http://vijaykrishna.posterous.com/explaining-the-ppp-paper-in-a-simple-manner http://vijaykrishna.posterous.com/explaining-the-ppp-paper-in-a-simple-manner

So this has been my latest homework that got me to my wits' end:

Give a presentation on the following paper: http://dl.acm.org/citation.cfm?id=1190268

And this was my presentation: Click here. (go to the link and do a ctrl+s)

Hope it helps you understand it quicker than it helped me.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Mon, 21 Nov 2011 14:23:49 -0800 History Slicing http://vijaykrishna.posterous.com/history-slicing http://vijaykrishna.posterous.com/history-slicing History Slicing - Francisco Servant, James A. Jones. Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE), Short paper track, Lawrence, Kansas, USA, November 2011, pp. 452–455.

This paper puts forth the concept of history slicing, a method which traces a part of the history of a subset of the code-base, which is of interest to the developer. This subset of history is called a history slice, which can then be used to understand the evolution of specific program features and design rationale, to carry out code forensics and for generating code based developer statistics.

This is done by building a history graph. Then, based on some form of selection criterion using which the lines to be investigated are chosen and the graph is traversed and the resulting slice is presented in an appropriate manner. This paper provides us with this bare minimum base upon which different methods for the steps described, can be used. The paper at every stage starts with a concept explains an adaptation for implementation and moves on to the next concept, thus giving a complete overview of the idea without dwelling deep into a specific direction, full of specific details. Many might view this as a folly, I do not. This in my opinion is a good basic start to novel subject.

However, the method that the authors select for building such a graph, in my view, is too simplistic. Each node in the graph, which represents a line of code, is allowed to be connected only to one node in the previous revision of the code and/or to one in the next revision of the code. This view seemingly ignores the idea of a single line of code being written or broken down to more than one line or vice versa. E.g.,int x= 10; can be split into the following two statements: int x; x=10; (semicolons represent end of a line)

Moreover, the idea of mapping the lines of code between two revisions is based on Diff, a tool which was originally meant for comparing files containing English text, not code. All this is done using the Levenshtein edit distance which calculates the difference between the lines using character analysis. In all this, we have missed out considerably on the semantic differences between the lines of code. As noted by Hunt and Tichy [1], conflicts between two versions of the code are generally identified using the notion of collocation, i.e. if two changes occur at the same point in a file. They further inform us with a more exact definition of conflicts based on the notion of semantic interference, i.e. if a change in one revision affects the results of a change in another revision. Semantic interference may not always be collocated.

The authors need to redefine their goals for history slicing which is based on syntactic comparison of line of code. This current approach to history slicing is good for informing developer statistics or for a line-by-line approach to code forensics. However, you require an approach more based on a semantic analysis of the code, when it comes to understanding the design rationale or evolution of the code base.

The evaluation of this method is based on a “Manual vs Automatic” study. The results of the study however, do not inform us of the differences in the history slices produced by the Manual and Automatic approaches. Did such differences, if present, have anything to do with the time taken? Was there a difference in approach between the two? It is fair to say that developers will use semantic contexts for generating history slices, which is not what the proposed method for building history graphs does?

The paper opens a novel concept for future study and exploration. However, it needs to revise the problems that it wants to solve using the approach presented. Also, the authors might consider automating the calculation and usage of a program slice as a selection criterion for creating a history slice.

[1] J. Hunt and W. Tichy. Extensible Language-Aware Merging. IEEE International Conference on Software Maintenance, 0:511–520, 2002.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 16 Nov 2011 12:27:00 -0800 The 2nd Time Dimension of the Vee SDLC Model http://vijaykrishna.posterous.com/the-2nd-time-dimension-of-the-vee-sdlc-model http://vijaykrishna.posterous.com/the-2nd-time-dimension-of-the-vee-sdlc-model

So i was reading about the vee-model today as a part of my readings for the SE class i am taking this term. The model was developed by NASA and it made a lot of sense to me, with the feedback between the various stages of the development prcosses across the two legs of the V. And then i got to the part where it spoke of time being represented on two axeses: the x and the z.

That is when i stopped, and opened by editor to jot down this post. Why would you want to represent the same quantity over two different dimensions, especially when they are representing the same thing?

The v- model, being three-dimensional, also has an axis that is normal to the plane, the z-axis. This represents components associated with multiple deliveries. Time is therefore represented by two axes: from left to right and into the plane.

- Ruparelia, N. B. (2010). Software development lifecycle models. ACM SIGSOFT Software Engineering Notes, 35(3), 8. doi:10.1145/1764810.1764814

So as pointed out by Ruparelia above, the z-axis is to showcase the multi deliverable software. One direct idea that comes to mind when you read this is that if the software was to be delivered multiple times, then it would be reasonable to assume that it the deliveries are to happen sequentially. And hence, the same V can be repeated in the Y-axis. Thus giving a more streamlined and uniform picture of the history of deliveries of the software.

Such an axis would make sense if, the same lifecycle were to be repeated in nearly the same time span, but in parallel for many deliverables of the same product. Now, i don't know if such a thing would even be possible. But to me it only makes sense then.

That being said, I am open to hear out other ideas and explanations for this 2nd dimension of time.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Mon, 31 Oct 2011 09:25:34 -0700 Papers on Writing Research Papers in Software Engineering http://vijaykrishna.posterous.com/papers-on-writing-research-papers-in-software http://vijaykrishna.posterous.com/papers-on-writing-research-papers-in-software
These are some really fundamental guides to writing quality research papers and doing good research in software engineering:
http://www.cs.cmu.edu/~Compose/ftp/shaw-fin-etaps.pdf

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Mon, 31 Oct 2011 09:16:39 -0700 Review on Foundations for the Study of Software Architecture, Dewayne E. Perry and Alexander L. Wolf http://vijaykrishna.posterous.com/review-on-foundations-for-the-study-of-softwa http://vijaykrishna.posterous.com/review-on-foundations-for-the-study-of-softwa
This paper gives the foundation of Software Architecture, in a very methodical manner. Well established architectural fields are analyzed to make an intuitive case for software architecture. While one would expect this to be an exploratory paper, based upon its title, it is has actually followed the problem-solution-validation model[1] for writing good software engineering research papers. This is followed by providing a context for the role of software architectures, and then by the problems they intend solving using a formal model for software architecture, which is proposed in the paper. This is model is validated based on the popularly accepted architecture of the compiler as an example.

This paper, showed software architecture as a bridge between system requirements and design, with the notion of Form in its {Elements-Form-Rationale} Model, which is over and above the common notion of the notion of just the Elements – a framework to effectively develop programs in a large system. This actually allowed for the requirements to be showcased in an organized manner via the architecture, thus, assisting in refining and consolidating the requirements themselves, even though the paper doesn’t state it.

The discussion of architectural styles, specifically about how the architecture of one product can be taken up as the architectural style of another product was fundamental. This allows, extension of existing products, on top of their architecture or framework towards a better or newer products with a different functionality, thus invoking the idea of extensibility. e.g., The development of new tools and apps using the framework of APIs provided by various social networking platforms.

While discussing the benefits of software architectures, the managerial basis for cost estimation and process management has not been discussed substantially. A remote reference can be found in the compiler’s parallel architecture description, as to how application oriented properties were an indicator the system’s complexity and thus be co-related with cost. No other substantial points were discussed.
________________________________

[1] Mary Shaw, Writing Good Software Engineering Research Papers Minitutorial

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Sat, 06 Aug 2011 08:03:41 -0700 Requirements Engineering http://vijaykrishna.posterous.com/requirements-engineering http://vijaykrishna.posterous.com/requirements-engineering
"In my debate team during undergrad i had a senior who told me that the issue with most debaters is not that they cannot speak to put forward their point. Their issue is that they do not know which point(s) to put forward."

To know what is required, in order to achieve an over all goal is probably the most difficult thing to do in any area of work. IEEE610, 1990: a requirement (in the context of software engineering) is a condition or capability which allows users to solve problems or achieve a goal.
Requirements Engineering is a phase in which the effort is to understand the problems(requirements) to be solved(satisfied) in order for us to arrive at a solution(working software system).  
The 'user' here may reffer to many different kinds of people:
a) the end user
b) the customer who pays the bill for the software
c) other stakeholders in the system.
Most requirements come from the end user for whom the software is finally meant for use. Other requirements are stated by the various other stake holders.
The word 'requirement' itself is a misnomer. A requirement is always given and cannot change once stated. But the world of software engineering, where software has to evolve, does not allow such a luxury. There are many factors changing the requirements of any project, such as:
a) Time to market,
b) Cost,
c) Conflicts in quality standards,
d) Conflicts in stakeholders' needs, etc.
These factors compel a trade off between the requirements that evolve and/or are in contention.

REQUIREMENT SPEC - The Document
a) The net requirements for any project are recorded in a document called REQUIREMENT SPEC. 
b) A requirement specification is, in an ideal world, the end product of the Requirements Engineering Phase.
c) It serves as the contract, formal/informal, between the client and the analyst; directing them towards a clear understanding of the problem at hand and thus, a solution to the same.
d) A high level design document:
It serves as the pivot around which the Design of the system is created. There can be no distinction between Requirements engineering and design as the two are deeply intertwined with each other. The reqs are viewed as a high level design of the system, based on which an overall design of the system is constructed. This design then consists of various components and the interfaces via which the components interact. This design is again used in creating a refinement of the requirements themselves. Such a process is greatly observed when prototyping techniques are used in building software. The two phases inherently revolve around each other in the case of agile methods of software development, where the two are constantly evolving in the presence of working system/solution.
Requirement Engineering and Design are presented distinctly, as consecutive stages of software development only for the sake of convinience.
e) The req spec is finally used in the acceptance testing of the system, when the working system is pitted against the req.spec. to validate and verify the system with the requirements.

REQ.ENGG. & ITS EXTENT 
Requirements engg is more than a simple analysis of the requirements of the system to be built. It is an iterative and co-operative process of: 
i) Analyzing a problem; 
ii) Recording the observations of such an analysis; 
iii) Checking/validating the understanding gained by the above two steps.

Furthermore, it is also goes beyond the idea just of recording requirements. It also encompasses the social and cognitive elements in the process of understanding the problem domain.

ELEMENTS OF REQ.ENGG.
Req engg contains a few fundamental facets to it: 
1) The most obvious of these happens to be the outlining of end-user requirements. Without these, this phase is useless and not complete. 
2) This being said, there are other functional requirements which emerge from stakeholders apart from the 'end user'. These are generally the managers and the administrators of the system. For instance, a manage would like to see how effective the system is on solving the problem for which it was created in the first place. An administrator may have to manage more functions such as managing user privilleges, maintaining proper logs of the system, etc. Such tasks are not required for a regular user of the system and hence they do not figure in the core feature list of the system.
3) Non-Functional Requirements: these requirements emerge from how the system will interface and interact with its enviroment. The system will have to interface with existing systems in order to fit into an existing infrastructure of solutions. For instance, in an online shopping store the product catalog will have to interface with existing check out and order management systems in order to give the customer a complete shopping experience. The product catalog, while being a self suffucient system in itself, alone cannot be effective when it comes to selling products online. The other set of requirements are to do with how the system interacts with the infrastructure on which it is deployed. Will respond well to a certain DBMS or an OS are important considerations which need to be accounted for. How will the system fare on various such technology stacks? What are the costs involved in maintaining such a system? How will the system scale as the data it manages acrues? 

While the 3rd set of requirements, the non-functional requirements, do not seem that important, it is essential that such things are looked with gross seriousness right from the very beginning. Sometimes this alson is the difference between successful and failed products.

GROUND REALITIES:
Apart from the above mentioned cats. of requirements and their sources, there are other ground realities about developing software:
1) Like we have seen time and time again, SOFTWARE EVOLVES because the REQUIREMENTS CHANGE. You cannot predefine a set of requirements and call it a day. Even is domains where the analyst is experienced, there can and will always be variations which need to be taken of care of effectively. These variations are in essence subtle and minor and thus go un-noticed. But their impact on the system is much larger and many-a-times at a very fundamental level. If such subtle changes in the requirements, especially in a 'known' domain, are not taken into account, then you are asking for trouble.
2) No one these days is interested in telling you what they want by going into the technicalities of things and solutions. What they are interested in is the final solution or product. Thus, unlike before where the customer/end-user and the analyst would discuss and outline the requirements, most requirements these days are derived out of market forces. In simpler terms, the process of requirement engg has now become market-driven rather than customer driven, more so anyway.
3) Also, you might decide to use COTS (commercial off the shelf) components for the building of the system, which might have its own set of constraints which will ultimately effect the requirements.
4) There are differences of opinion among various shareholders as well, which lead to a negotiation of requirements between the stakeholders and the analyst.

The Iteration cycle in requirement engg.: Elicitation -> Specification -> Validation -> Negotiation -> Elicitation... 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Thu, 04 Aug 2011 09:01:12 -0700 Rams' ten principles to "good design" http://vijaykrishna.posterous.com/rams-ten-principles-to-good-design http://vijaykrishna.posterous.com/rams-ten-principles-to-good-design
Good design:
Is innovative - Rams states that possibilities for innovation in design are unlikely to be exhausted since technological development is always offering new opportunities for innovative design. He also highlights that innovative design always develops in tandem with innovative technology and can never be an end in and of itself.
Makes a product useful - A product is bought to be used. It has to satisfy certain criteria, not only functional, but also psychological and aesthetic. Good design emphasises the usefulness of a product whilst disregarding anything that could possibly detract from it.
Is aesthetic - Only well-executed objects can be beautiful. The aesthetic quality of a product is integral to its usefulness because products used every day have an effect on people and their well-being.
Makes a product understandable - It clarifies the product’s structure. Better still, it can make the product clearly express its function by making use of the user's intuition. At best, it is self-explanatory.
Is unobtrusive - Products and their design should be both neutral and restrained, to leave room for the user’s self-expression. Products fulfilling a purpose are like tools and are neither decorative objects nor works of art.
Is honest - Honest design should not attempt to make a product seem more innovative, powerful or valuable than it really is. It should not attempt to manipulate the consumer with promises that cannot be kept.
Is long-lasting - It should avoid being fashionable and therefore never appears antiquated. Unlike fashionable design, it lasts many years – even when the trend may be in favor for disposable products.
Is thorough down to the last detail - Dieter Rams states that nothing must be arbitrary or left to chance in the design of a product since care and accuracy in the design process show respect towards the consumer.
Is environmentally friendly - Good design should make an important contribution to the preservation of the environment by conserving resources and minimizing physical and visual pollution throughout the lifecycle of the product.
Is as little design as possible - Dieter Rams makes the distinction between the common "Less is more" and his strongly advised "Less, but better" highlighting the fact that this approach focuses on the essential aspects thus, the products are not burdened with non-essentials. The desirable result would then be purer and simpler.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Thu, 28 Jul 2011 16:09:26 -0700 Parameters of Project Control http://vijaykrishna.posterous.com/parameters-of-project-control http://vijaykrishna.posterous.com/parameters-of-project-control
Time: time estimate largely depends upon the size of the project itself. the larger the project more time will it take to complete the project. the time frame can be reduced partially by increasing the work force allocated to the project. However, there will always be a trade off between the number of people and time. why should this matter though? simply because, an increase in the number of people requires an increase in the management overheads in contoling all the people resource effectively. Also after a point the increase in the number of people starts to have a counter effect on the time required to complete the project. It can almost be argued that increasing the work-force on a project delays the project more often than not.
Organization: the organization of a team is of great importance for successful projects. If every member clearly knows what to do (and more importantly why he/she is to do it) then the positive results seen in the development of the project are more than evident. In an event that this has not been conveyed clearly, there is no proper sharing of responsibility with a project team which only induces delays and poor quality of work. Also, it leads the members of the team to set their own goals, which has its own obvious repercussions.
Quality: No one is interested in the technical know-how of any product. Ultimately, the product should work. I.e., it should end up doing what it is supposed to do in the right manner and at the right times. This is where the concept of quality and quality assurance stems from. This is best achieved my ensuring high standards of quality at every stage of the product development. it should never be a one time affair, which is done once the implementation phase is at a close. This is how quality is incorporated into any product as its fundamental feature rather than an add-on. ("the quality requirements for software and for its development process are often in conclict."??) 
Money: A very fundamental dimension of project control. Expenses are largely decided by the labour force emplyed and it calculated in terms of man-hours. this (read labor force) is turn is evaluated on the basis of the size of the project. However, there are other factors that need to be taken into account while making cost estimations of software. Hardware and material resources are one set of things which require expenditures (these are small and very precicely calculatable). since the fundamental premise of expenditure is still labour, many labor cutting methods have been employed. These include use of tools and techniques for improved productivity, use of existing code bases to build new products, etc. Such techniques, while do cut on labour costs, have costs and expenditure of their own. Thus, in a convoluted manner, the software development slowly starts to become a cost-intensive proposition rather than a labor intensive proposition. 
Information: This comes in the form of documentation. and not only technical or user documentation but also project documentation which includes information pertaning the current state of the project, various changes agreed upon and decisions made.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 27 Jul 2011 06:06:00 -0700 What is Software Engineering? http://vijaykrishna.posterous.com/what-is-software-engineering http://vijaykrishna.posterous.com/what-is-software-engineering
1. It is to do with development of Large Systems.
2. The central theme is to master complexity. This complexity is to do with the existence of too many simple components rather than a few or a single component based on a sophisticated algorithm.
3. You are trying to model reality. In most cases it (read reality) is an existing manual process.
3.1. Software evolves like any reality.
3.2. The Software's efficieny is of great importance (as it models a reality).
3.3. Software has to support its users effectively. (this is highly implied!)
4. Co-operation is a key requirement between the parties and people developing software.
5. Members of one culture create artifacts for the members of another culture. eg: Most SE'ers are not bankers but then they do create banking/eCommerce systems.
6. It is a balancing act between that of an art and engineering.
REF:(6:31 PM 7/27/2011 Hans Van Vliet SE Principles and Practice)

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Wed, 27 Jul 2011 04:20:00 -0700 Hardware v/s Software: Replacing Faulty Components http://vijaykrishna.posterous.com/hardware-vs-software-replacing-faulty-compone http://vijaykrishna.posterous.com/hardware-vs-software-replacing-faulty-compone
1. I stumbled upon an interesting difference between hardware and software today. Hardware errors are percieved to be random. I.e. when a hardware failure takes place, there is an implicit assumption that some component gave way to such an error or failure. And thus a simlpe replacement of that hardware equipment with an identical hardware component does the trick, or so is the assumption. This later assumption is true is most cases.
2. That being said, software works differently. The errors in Software are not due to a random faliure in its workings, simply because software unlike hardware is built on logical elements. Such faulty/error-inducing logic will replicate itself at all places where the software is used. Thus a simple replacement of software with its identical copy is of no use. Rather a specific modification of the logical workings of the software needs to be introduced for the software to function properly. 
3. Personal Obervation: This shows the contrast in the gap between the design and production elements, when it comes to hardware and software. This gap is huge when it comes to hardware: the design rarely has a direct impact on the workings of the hardware. Or rather the workings of the hardware depend on more factors than the design itself. Where as in the case of software, the desgin is so tightly coupled with the product itself that there does not seem to be any other reason for failure, other than design faults. This is assuming that the design was properly translated to the product in both cases. 
REF:(2:55 PM 7/27/2011 Hans Van Vliet SE Principles and Practice)

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Mon, 18 Apr 2011 11:04:15 -0700 Makeovers http://vijaykrishna.posterous.com/makeovers http://vijaykrishna.posterous.com/makeovers Roughly a month ago i was hard at work, against time time and energies to integrate a whole new set of changes in the services with the UI. It was crazy to put it mildly and it was one huge chunk and it was evolving everyday. The whole transition to the new changes took a new 5 days to come to stable grounds. After i was done, i left for a week long break. I knew that the changes would continue, even from where i left off. When i came back however, change was made in virtually every way the application was conceived to begin with. The changes resulted in a drastic makeover and i was in no state to even comprehend the new dynamics of the business that were introduced in the software. I was truly lost. Makeovers are a part of any developmental process. But the big question is if they should be. At least by definition, they should not. A makeover is something that has to come after the development phase is over, instead of being a part of it. In the world of software, where the work and the daily text is  adorning the guises of programming languages, understanding the changes becomes even more difficult. One might argue that this is a common feature of any profession and is not specific or more so in the software world. But i tend to disagree here. Not because i am a part of this Industry of Logic, but because as a proponent of logic i can not help but notice that this industry is still young and is driven by a truly illogical set of rules and frameworks. The code that every one seems to write is more art than clerical form filling. Every one has their styles and hands at coding, like i have discussed this in a previous post. And this being a primary occupation of solving problems, and varied paths may be employed at solving the same problem, the diversity of the work in front of you increases, especially in terms of solutions. So, when you sit down to try and understand what has happened to an existing project, especially one which you worked on, you will find it extremely difficult to comprehend or rather accept new ideas and the general change that has come about in the system. In most software development that i have seen taking place around me, the makeovers are in the form of incremental changes taking place over an extended period of time. True makeovers i think should be a total change of any system ground up. This is a true harbinger of complexity in any system. Such makeovers only complicate the whole process of getting acquainted with new systems. i do not claim that what i have seen or heard of is the general nature of makeovers in any field or profession. But makeovers should, in my opinion, be implemented when one attempts to make something simpler, faster and lighter. This implies a rewriting of the script which may result in an initial failure of the system with its new face. Nonetheless, it should not be avoided. The price of getting better at something is imminent failure, but often that is also its reason in the first place.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu
Fri, 15 Apr 2011 17:20:29 -0700 An SQL puzzle http://vijaykrishna.posterous.com/an-sql-puzzle http://vijaykrishna.posterous.com/an-sql-puzzle Check out this little snippet of SQL: [sourcecode language="sql"] SELECT PRAMvtID, PRAMvtRALoadDteTme, PRAMvtRAStatus, PRAMvtRAType, PRAMvtRAApprvlStatus, PRAMvtCompany, PRAMvtDeliveryNetwrk, PRAMvtMeasurePoint, PRAMvtSalesMonth, PRAMvtTicketNo INTO #complete_tkts FROM PAPRAMovement d WHERE PRAMvtID IN (SELECT PRAMvtID FROM PAPRAMovement dd WHERE dd.PRAMvtCompany = d.PRAMvtCompany AND dd.PRAMvtDeliveryNetwrk = d.PRAMvtDeliveryNetwrk AND dd.PRAMvtMeasurePoint = d.PRAMvtMeasurePoint AND dd.PRAMvtSalesMonth = d.PRAMvtSalesMonth AND dd.PRAMvtTicketNo = d.PRAMvtTicketNo AND dd.PRAMvtRAStatus = 'C' ) [/sourcecode] When my friend showed me this snippet, i was a little amused. I asked why was the extra nesting required when he was using the same table to compare? He could have might as well gone with this: [sourcecode language="sql" highlight="13"] SELECT PRAMvtID, PRAMvtRALoadDteTme, PRAMvtRAStatus, PRAMvtRAType, PRAMvtRAApprvlStatus, PRAMvtCompany, PRAMvtDeliveryNetwrk, PRAMvtMeasurePoint, PRAMvtSalesMonth, PRAMvtTicketNo INTO #complete_tkts FROM PAPRAMovement WHERE PRAMvtRAStatus = 'C' [/sourcecode] But thats when he told me that, the query was designed in that manner for a reason and he was not able to figure it out. Thats when i actually sat up and started to pay attention. After 5 mins of close inspection of the snippet, i realized that this was a brilliantly  designed query to perform grouping of the PRAMvtIDs in the following group order of the table coloumns:
  • PRAMvtCompany
  • PRAMvtDeliveryNetwrk
  • PRAMvtMeasurePoint
  • PRAMvtSalesMonth
  • PRAMvtTicketNo
Now one might ask, why not use a simple GroupBy instead. This argument grows stronger when u realise that the grouping with this query will not be a perfect cascade. The grouping will take place in a random fashion based on the way the above 5 values are encountered in the table row by row. What i am getting at is that the data will not be ordered, but it will be grouped.
So the question remains... why not the group? For one, if this query was to be used in Linq, then it would return a List. But if one were to use the Groupby, then what you would get is a Dictionary. So, i guess it is a pure matter of choice. It was an interesting query. It taught me yet again that most things in life should have a purpose, for their existence.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1369599/pic.jpeg http://posterous.com/users/hcGXxsTkwP6SS Vijay Krishna Palepu vpalepu Vijay Krishna Palepu