Our shape analysis methdo have been implemented in a compiler written in C which analyzes a C code to generate the RSRSG associated with each sentence of the code. As we said, the symbolic execution of each sentence over an RSRSG is going to generate a modified RSRSG as we have presented in Fig. 2. We can see that the new is just the set resulting from the union of all compatible 's that were obtained after the abstract interpretation of the sentence over all the graphs belonging to . Remember that the division, pruning, and compress operations are instanced during the abstract interpretation of each sentence.

Progressive analysis

As we have seen, the set of properties associated with a node allows the compiler to keep in separate nodes those memory locations with different properties. Obviously, the number of nodes in the RSRSGs depends on the number of properties and also on the range of values these properties can take. The higher the number of properties the better the accuracy in the memory configuration representation, but also the larger the RSRSGs and memory wastage.

Fortunately, not all the properties are needed to achieve a precise description of the data structure in all the codes. That is, simpler codes can be successfully analyzed as fewer properties are taken into account, and complex programs will need more compilation time and memory due to all the properties that have to be considered. Bearing this in mind, we have implemented the compiler to carry out a progressive analysis which starts with fewer constraints to summarize nodes, but, when necessary, these constraints are increased to reach a better approximation of the data structure used in the code.

More precisely, the compiler analysis comprises three levels: , , and , from less to more complexity as we explain next:

- : In this level the
`TOUCH`sets are not built nor taken into account and only the`C_SPATH0`condition is used. Therefore, the compatibility functions,`C_NODES_RSG`, and`C_NODES`, which control the summarization behavior are simplified to avoid the constraint`TOUCH``TOUCH`and to use`C_SPATH`. - : This level is based on the previous one but now using the
`C_SPATH1`. This way, in functions`C_NODES_RSG`and`C_NODES`, the`C_SPATH`condition is applied. - : This is the higher level in which all the properties
including the
`TOUCH`one are taken into account.

With this compiler we have analyzed several C codes: the one described in Sect. 2.1 (working example), the sparse Matrix by vector multiplication, the sparse Matrix by Matrix multiplication, the Sparse LU factorization, and the Barnes-Hut code. The first four codes were successfully analyzed in the first level of the compiler, . However, for the Barnes-Hut program the highest accuracy of the RSRSGs was obtained in the last level, , as we explain in Sect. 5.3. All these codes where analyzed by our compiler in a Pentium III 500 MHZ with 128 MB main memory. The time and memory required by the compiler are summarized in Table 1. The particular aspects of these codes are described next.

We first refer in this subsection to the code that generates,
traverses, and modifies the data structure presented in
Fig. 3 (a). A compact representation of the resulting
RSRSG for the last sentence of the code can be seen in
Fig. 3 (b). Although we do not show the code due
to space constraints, we have to mention that this code presents
an additional difficulty due to some tree permutations being
carried out during data structure modification. Summarizing, after
the compiler analyzes this code, the compact representation of the
resulting RSRSG for the last sentence of the program
(Fig. 3 (b)) accurately describes the data
structure depicted in Fig. 3 in the sense that:
(i) the compiler successfully detects the doubly linked list which
is acyclic by selectors or and whose elements point to
binary trees; (ii) as ` SHSEL`
=0, we can say that two different items of the
header list cannot point to the same tree; (iii) at the same time, as
no tree node (, and ) is shared, we can say that
different trees do not share items; and (iv) the same happens for the
doubly linked list pointed to by the tree leaves: all the lists are
independent, there are no two leaves pointing to the same list, and
these lists are acyclic by selectors or .

Besides this, our compiler has also analyzed three sparse codes which generate, traverse, and modify complex dynamic data structures. A more detailed description of these codes and the compiler results is in [3]. However, we just summarize here the functionality of these codes, for which the compiler generates very accurate RSRSGs that precisely describe the real data structure used in the codes.

**Sparse matrix-vector multiplication**. Here we deal with an irregular code which implements a sparse matrix by vector multiplication, where the vectors are doubly linked lists and the matrix is a doubly linked list with pointers to other doubly linked lists which store the matrix rows.**Sparse matrix-matrix multiplication**. This code implements a sparse matrix by matrix multiplication, . All the sparse matrices are also stored as a doubly linked list with pointers to sparse rows. Note that matrix A grows during the multiplication process.**Sparse LU factorization**. This code solves non-symmetric sparse linear systems by applying the LU factorization of the sparse matrix. Here, the sparse matrix is a list of pointers to sparse columns. However, this code is much more complex to analyze due to the partial pivoting and column permutation which takes place in the factorization in order to provide numerical stability and preserve the sparseness.

As we mentioned, all these codes are accurately analyzed in the compiler L1 level. However, in Table 1 we also present time and memory requirements of levels L2 and L3 to illustrate how increasing the number of properties of the compiler usually leads to higher compilation times and larger memory pools. For the Sparse LU factorization, our compiler runs out of memory in L2 and L3 in our 128 MB Pentium III and, therefore, only values for L1 level are provided. The exception in Table 1 is due to the Barnes-Hut N-body simulation which is described next.

Barnes-Hut N-body simulation

This code deserves more attention due to it being a good example in which the compiler has to sequentially carry out the three levels of compilation in the progressive analysis.

This code is based on the algorithm presented in [1] which is used in astrophysics. In fact, this application simulates the evolution of a system of bodies under the influence of gravitational forces. It is a classical gravitational N-body simulation, in which each body is modeled as a point mass. The simulation proceeds over time-steps, each step computing the net force on every body and thereby updating that body's position and other attributes. The data structure used in this code is based on a hierarchical octree representation of space in three dimensions.

In Fig. 5(a) we present a schematic view of the data structure used in this code. The bodies are stored by a single linked list pointed to by the pvar . The octree represents the several subdivisions of the 3D space. That is, the root of the tree represents the whole space, each one of its children represents a single subsquare of this space, and so on. This way, each leaf of the octree represents a subsquare which contains a single body and therefore points to this body stored in the list. Each octree node which is not a leaf has a pointer pointing to the first of its eight children which are linked by selector .

The three main steps in the algorithm are: (i) The creation of the octree and the pointers from the elements of the octree to the elements of the list; (ii) for each subsquare in the octree, compute the center of mass and total mass for all the particles it contains; and (iii) for each particle, traverse the tree to compute the forces on it.

All the traversals of the octree are carried out in the code in recursive calls. Due to the fact that our compiler is still not able to perform an interprocedural analysis, we have manually carried out the inline of the subroutine and the recursivity has been transformed into a loop. This loop uses a stack pointing to the nodes which are referenced during the octree traversal. This stack is also considered in Fig. 5 (a) and obtained in the corresponding RSRSG, Fig. 5 (b).

After the analysis of the code, the resulting RSRSG for the last
sentence of the code does not correspond with the real properties of
the data structure used in the code. The problem is that the summary
node , which represents the middle elements of the
list fulfill `SHSEL`(body) = true. This would imply that
there may be different leaves in the octree pointing to the same body
in the list. This inaccuracy is due to an imprecise analysis
during the generation of the list of children in the octree.

This inaccuracy is avoided moving on to the level, thanks to the
use of `C_SPATH1`. The resulting RSRSG can be seen in
Fig. 5(b), where the summary node fulfills `SHSEL`
= false, in line with the real data
structure. However, due to the stack we use to assist the octree
traversal, there is a problem that cannot be solved in : nodes
, , and are shared by selector from the
data structure. This would prevent the parallel traversal of the
octree and does not fit with the real data structure. The lack of
accuracy is now due to the fact that, during the traversal of the
octre, visited nodes are summarized with non-visited ones.

However, by switching to the compilation level this problem is
solved thanks to the `TOUCH` property which keeps in separated
summary nodes visited and non-visited nodes. Now for the RSRSG
associated with the first sentence after the data structure
generation and the one in the last sentence of the code, the `SHSEL` is false for all selectors. A subsequent analysis of the
code can state that the tree can be traversed and updated in
parallel.

However, regarding the Table 1, there is a paradoxical
behavior that deserves an explanation: and expend less
time and memory than . As was briefly described in the example
in Sect. 4.2, when `SHARED` and `SHSEL` are
false there are more nodes and links pruned during the abstract
interpretation and graph compression phase of the symbolic
execution of a sentence. In this code, for the and
levels, the ` SHSEL`
= false leads to more pruning and graph
simplifications counteracting the increase in complexity due to the
use of the `C_SPATH1` and `TOUCH` properties.

Rafael Asenjo Plaza Last modified: Tue Feb 19 18:45:57 CET 2002