Related Work

There are several ways this shape analysis problem can be approached, some of which are based on explicit programmer annotations [10], and others are based on abstracting the properties of data structures by means of ``path expressions'' [12], ``matrices'' [7] or graphs. We have focussed on graph-based methods as they are able to explicitly keep information about dynamic objects not pointed to by any pointer variable. In these graphs the nodes represent the ``storage chunks'' and edges are used to represent references between them. Some of these graph-based methods use just one graph to approximate all possible memory configurations for each sentence in the code [2,13,14,4], whereas other methods permit the existence of several graphs to represent the information more accurately [9,15]. In [5,6] we describe our compiler based on the use of a reduced set of reference shape graphs (RSRSG) which approximates the data structure at each program point.

In [3,4,5,6] there are sections in which we compare the methods implemented in our compiler with the above mentioned related works. However, we would like to state here the main differences between the work we presented in [5,6] and the Sagiv et all work [15], because, from our point of view, the Sagiv's group has reach the most relevant results in the area of shape analysis. The first main difference is that in our work we consider important properties, like the ``reference patterns'' and ``touch information'' among others (described next) which lead to a more precise description of the data structure used in the code. Second, we join similar reference shape graphs (RSG) to build a reduced set of RSG for each program point, while in [15] they keep all the graphs. We think that this may explain why their Three-Valued-Logic Analyzer (TVLA) run out of memory for simple codes like the singly linked list insert sort and bubble sort using the multiple structure approach [11]. And third, they recognize that their TVLA engine is only useful to analyze small programs and they have report experimental results for singly linked list small operations (insert, reverse, sort, etc). However they have not published experimental results successfully dealing with real C codes based on combination of complex data structures like doubly linked list pointing to trees or to other lists, etc. Summarizing, and to the best of our knowledge, our compiler is the only one able to accurately identify the data structure at each sentence of a real C code. The analyzed codes are based on complex data structures such as doubly linked lists, trees, and octrees among others, and combinations of them, such as a doubly linked list of pointers to trees where the leaves point to doubly linked lists, etc.

Rafael Asenjo Plaza
Last modified: Tue Feb 19 19:24:57 CET 2002