Experimental results for dynamic data structures

Our shape analysis methdo have been implemented in a compiler written in C which analyzes a C code to generate the RSRSG associated with each sentence of the code. As we said, the symbolic execution of each sentence over an RSRSG is going to generate a modified RSRSG as we have presented in Fig. 2. We can see that the new $ RSRSG_{o}$ is just the set resulting from the union of all compatible $ rsg$'s that were obtained after the abstract interpretation of the sentence over all the graphs belonging to $ RSRSG_{i}$. Remember that the division, pruning, and compress operations are instanced during the abstract interpretation of each sentence.


Progressive analysis

As we have seen, the set of properties associated with a node allows the compiler to keep in separate nodes those memory locations with different properties. Obviously, the number of nodes in the RSRSGs depends on the number of properties and also on the range of values these properties can take. The higher the number of properties the better the accuracy in the memory configuration representation, but also the larger the RSRSGs and memory wastage.

Fortunately, not all the properties are needed to achieve a precise description of the data structure in all the codes. That is, simpler codes can be successfully analyzed as fewer properties are taken into account, and complex programs will need more compilation time and memory due to all the properties that have to be considered. Bearing this in mind, we have implemented the compiler to carry out a progressive analysis which starts with fewer constraints to summarize nodes, but, when necessary, these constraints are increased to reach a better approximation of the data structure used in the code.

More precisely, the compiler analysis comprises three levels: $ L_1$, $ L_2$, and $ L_3$, from less to more complexity as we explain next:

With this compiler we have analyzed several C codes: the one described in Sect. 2.1 (working example), the sparse Matrix by vector multiplication, the sparse Matrix by Matrix multiplication, the Sparse LU factorization, and the Barnes-Hut code. The first four codes were successfully analyzed in the first level of the compiler, $ L1$. However, for the Barnes-Hut program the highest accuracy of the RSRSGs was obtained in the last level, $ L3$, as we explain in Sect. 5.3. All these codes where analyzed by our compiler in a Pentium III 500 MHZ with 128 MB main memory. The time and memory required by the compiler are summarized in Table 1. The particular aspects of these codes are described next.

Table 1: Time and space required by the compiler to analyze several codes
  Working Ex. S.Mat-Vec S.Mat-Mat S.LU fact. Barnes-Hut
Level L1 / L2 / L3 L1 / L2 / L3 L1 / L2 / L3 L1 L1 / L2 / L3
Time 0'09''/0'15''/0'16'' 0'03''/0'05''/0'07'' 0'51''/1'36''/1'57 12'15'' 12'45''/1'27''/2'33''
Space (MB) 2.11/2.78/3.02 1.37/1.85/2.17 8.13/11.45/12.68 99.46 37.82/8.82/8.94

Working example and sparse matrix codes

We first refer in this subsection to the code that generates, traverses, and modifies the data structure presented in Fig. 3 (a). A compact representation of the resulting RSRSG for the last sentence of the code can be seen in Fig. 3 (b). Although we do not show the code due to space constraints, we have to mention that this code presents an additional difficulty due to some tree permutations being carried out during data structure modification. Summarizing, after the compiler analyzes this code, the compact representation of the resulting RSRSG for the last sentence of the program (Fig. 3 (b)) accurately describes the data structure depicted in Fig. 3 in the sense that: (i) the compiler successfully detects the doubly linked list which is acyclic by selectors $ nxt$ or $ prv$ and whose elements point to binary trees; (ii) as SHSEL $ (n_4,tree)$=0, we can say that two different items of the header list cannot point to the same tree; (iii) at the same time, as no tree node ($ n_4$, $ n_5$ and $ n_6$) is shared, we can say that different trees do not share items; and (iv) the same happens for the doubly linked list pointed to by the tree leaves: all the lists are independent, there are no two leaves pointing to the same list, and these lists are acyclic by selectors $ nxt$ or $ prv$.

Besides this, our compiler has also analyzed three sparse codes which generate, traverse, and modify complex dynamic data structures. A more detailed description of these codes and the compiler results is in [3]. However, we just summarize here the functionality of these codes, for which the compiler generates very accurate RSRSGs that precisely describe the real data structure used in the codes.

As we mentioned, all these codes are accurately analyzed in the compiler L1 level. However, in Table 1 we also present time and memory requirements of levels L2 and L3 to illustrate how increasing the number of properties of the compiler usually leads to higher compilation times and larger memory pools. For the Sparse LU factorization, our compiler runs out of memory in L2 and L3 in our 128 MB Pentium III and, therefore, only values for L1 level are provided. The exception in Table 1 is due to the Barnes-Hut N-body simulation which is described next.

Barnes-Hut N-body simulation

This code deserves more attention due to it being a good example in which the compiler has to sequentially carry out the three levels of compilation in the progressive analysis.

This code is based on the algorithm presented in [1] which is used in astrophysics. In fact, this application simulates the evolution of a system of bodies under the influence of gravitational forces. It is a classical gravitational N-body simulation, in which each body is modeled as a point mass. The simulation proceeds over time-steps, each step computing the net force on every body and thereby updating that body's position and other attributes. The data structure used in this code is based on a hierarchical octree representation of space in three dimensions.

In Fig. 5(a) we present a schematic view of the data structure used in this code. The bodies are stored by a single linked list pointed to by the pvar $ Lbodies$. The octree represents the several subdivisions of the 3D space. That is, the root of the tree represents the whole space, each one of its children represents a single subsquare of this space, and so on. This way, each leaf of the octree represents a subsquare which contains a single body and therefore points to this body stored in the $ Lbodies$ list. Each octree node which is not a leaf has a pointer $ child$ pointing to the first of its eight children which are linked by selector $ next$.

Figure 5: Barnes-Hut data structure and compacted RSRSG.
(a) & (b)

The three main steps in the algorithm are: (i) The creation of the octree and the pointers from the elements of the octree to the elements of the $ Lbodies$ list; (ii) for each subsquare in the octree, compute the center of mass and total mass for all the particles it contains; and (iii) for each particle, traverse the tree to compute the forces on it.

All the traversals of the octree are carried out in the code in recursive calls. Due to the fact that our compiler is still not able to perform an interprocedural analysis, we have manually carried out the inline of the subroutine and the recursivity has been transformed into a loop. This loop uses a stack pointing to the nodes which are referenced during the octree traversal. This stack is also considered in Fig. 5 (a) and obtained in the corresponding RSRSG, Fig. 5 (b).

After the $ L1$ analysis of the code, the resulting RSRSG for the last sentence of the code does not correspond with the real properties of the data structure used in the code. The problem is that the summary node $ n_6$, which represents the middle elements of the $ Lbodies$ list fulfill SHSEL(body) = true. This would imply that there may be different leaves in the octree pointing to the same body in the $ Lbodies$ list. This inaccuracy is due to an imprecise analysis during the generation of the list of children in the octree.

This inaccuracy is avoided moving on to the $ L2$ level, thanks to the use of C_SPATH1. The resulting RSRSG can be seen in Fig. 5(b), where the summary node $ n_6$ fulfills SHSEL $ (n_6,body)$ = false, in line with the real data structure. However, due to the stack we use to assist the octree traversal, there is a problem that cannot be solved in $ L2$: nodes $ n_2$, $ n_3$, and $ n_4$ are shared by selector $ node$ from the $ Stack$ data structure. This would prevent the parallel traversal of the octree and does not fit with the real data structure. The lack of accuracy is now due to the fact that, during the traversal of the octre, visited nodes are summarized with non-visited ones.

However, by switching to the $ L3$ compilation level this problem is solved thanks to the TOUCH property which keeps in separated summary nodes visited and non-visited nodes. Now for the RSRSG associated with the first sentence after the data structure generation and the one in the last sentence of the code, the SHSEL is false for all selectors. A subsequent analysis of the code can state that the tree can be traversed and updated in parallel.

However, regarding the Table 1, there is a paradoxical behavior that deserves an explanation: $ L2$ and $ L3$ expend less time and memory than $ L1$. As was briefly described in the example in Sect. 4.2, when SHARED and SHSEL are false there are more nodes and links pruned during the abstract interpretation and graph compression phase of the symbolic execution of a sentence. In this code, for the $ L2$ and $ L3$ levels, the SHSEL $ (n_6,body)$ = false leads to more pruning and graph simplifications counteracting the increase in complexity due to the use of the C_SPATH1 and TOUCH properties.

Rafael Asenjo Plaza
Last modified: Tue Feb 19 18:45:57 CET 2002