next up previous
Next: Conclusions and future work Up: Experimental results Previous: Working example and sparse


Barnes-Hut N-body simulation

This code deserves more attention due to it being a good example in which the compiler has to sequentially carry out the three levels of compilation in the progressive analysis.

This code is based on the algorithm presented in [1] which is used in astrophysics. In fact, this application simulates the evolution of a system of bodies under the influence of gravitational forces. It is a classical gravitational N-body simulation, in which each body is modeled as a point mass. The simulation proceeds over time-steps, each step computing the net force on every body and thereby updating that body's position and other attributes. The data structure used in this code is based on a hierarchical octree representation of space in three dimensions.

In Fig. 5(a) we present a schematic view of the data structure used in this code. The bodies are stored by a single linked list pointed to by the pvar $ Lbodies$. The octree represents the several subdivisions of the 3D space. That is, the root of the tree represents the whole space, each one of its children represents a single subsquare of this space, and so on. This way, each leaf of the octree represents a subsquare which contains a single body and therefore points to this body stored in the $ Lbodies$ list. Each octree node which is not a leaf has a pointer $ child$ pointing to the first of its eight children which are linked by selector $ next$.

Figure 5: Barnes-Hut data structure and compacted RSRSG.
\begin{figure*}\begin{center}
\begin{tabular}{cc}
\raisebox{2ex}[0cm][0cm]{\psfi...
...SG.eps},width=.48\linewidth}\\
(a) & (b)
\end{tabular}\end{center}\end{figure*}

The three main steps in the algorithm are: (i) The creation of the octree and the pointers from the elements of the octree to the elements of the $ Lbodies$ list; (ii) for each subsquare in the octree, compute the center of mass and total mass for all the particles it contains; and (iii) for each particle, traverse the tree to compute the forces on it.

All the traversals of the octree are carried out in the code in recursive calls. Due to the fact that our compiler is still not able to perform an interprocedural analysis, we have manually carried out the inline of the subroutine and the recursivity has been transformed into a loop. This loop uses a stack pointing to the nodes which are referenced during the octree traversal. This stack is also considered in Fig. 5 (a) and obtained in the corresponding RSRSG, Fig. 5 (b).

After the $ L1$ analysis of the code, the resulting RSRSG for the last sentence of the code does not correspond with the real properties of the data structure used in the code. The problem is that the summary node $ n_6$, which represents the middle elements of the $ Lbodies$ list fulfill SHSEL(body) = true. This would imply that there may be different leaves in the octree pointing to the same body in the $ Lbodies$ list. This inaccuracy is due to an imprecise analysis during the generation of the list of children in the octree.

This inaccuracy is avoided moving on to the $ L2$ level, thanks to the use of C_SPATH1. The resulting RSRSG can be seen in Fig. 5(b), where the summary node $ n_6$ fulfills SHSEL $ (n_6,body)$ = false, in line with the real data structure. However, due to the stack we use to assist the octree traversal, there is a problem that cannot be solved in $ L2$: nodes $ n_2$, $ n_3$, and $ n_4$ are shared by selector $ node$ from the $ Stack$ data structure. This would prevent the parallel traversal of the octree and does not fit with the real data structure. The lack of accuracy is now due to the fact that, during the traversal of the octre, visited nodes are summarized with non-visited ones.

However, by switching to the $ L3$ compilation level this problem is solved thanks to the TOUCH property which keeps in separated summary nodes visited and non-visited nodes. Now for the RSRSG associated with the first sentence after the data structure generation and the one in the last sentence of the code, the SHSEL is false for all selectors. A subsequent analysis of the code can state that the tree can be traversed and updated in parallel.

However, regarding the Table 1, there is a paradoxical behavior that deserves an explanation: $ L2$ and $ L3$ expend less time and memory than $ L1$. As was briefly described in the example in Sect. 4.2, when SHARED and SHSEL are false there are more nodes and links pruned during the abstract interpretation and graph compression phase of the symbolic execution of a sentence. In this code, for the $ L2$ and $ L3$ levels, the SHSEL $ (n_6,body)$ = false leads to more pruning and graph simplifications counteracting the increase in complexity due to the use of the C_SPATH1 and TOUCH properties.


next up previous
Next: Conclusions and future work Up: Experimental results Previous: Working example and sparse
Rafael Asenjo Plaza 2002-02-19