next up previous
Next: Working example Up: Progressive Shape Analysis for Previous: Motivation

Method overview

Basically, our method is based on approximating all possible memory configurations that can appear after the execution of a sentence in the code. Note that due to the control flow of the program, a sentence could be reached by following several paths in the control flow. Each ``control path'' has an associated memory configuration which is modified by each sentence in the path. Therefore, a single sentence in the code modifies all the memory configurations associated with all the control paths reaching this sentence. Each memory configuration is approximated by a graph we call Reference Shape Graphs (RSG). Thus, taking all this into account, we conclude that each sentence in the code will have a set of RSGs associated with it. This set of RSGs will describe the shape of the data structure after the execution of this sentence.

The calculation of this set of graphs is carried out by the symbolic execution of the program over the graphs. In this way, each program sentence transforms the graphs to reflect the changes in the memory configurations derived from the sentence execution.

The RSGs are graphs in which nodes represent memory locations which have similar reference patterns. Therefore, a single node can safely and accurately represent several memory locations (if they are similarly referenced) without losing their essential characteristics.

To determine whether or not two memory locations should be represented by a single node, each one is annotated with a set of properties. Now, two different memory locations will be ``summarized'' in a single node if they fulfill the same properties. Note that the node inherits the properties of the memory locations represented by this node. Besides this, two nodes can be also summarized if they represent ``summarizable'' memory locations. This way, a possibly unlimited memory configuration can be represented by a limited size RSG, because the number of different nodes is limited by the number of properties of each node. These properties are related to the reference pattern used to access the memory locations represented by the node. Hence the name Reference Shape Graph.

As we have said, all possible memory configurations which may arise after the execution of a sentence are approximated by a set of RSGs. We call this set the Reduced Set of Reference Shape Graphs (RSRSG), since not all the different RSGs arising in each sentence will be kept. On the contrary, several RSGs related to different memory configurations will be fused when they represent memory locations with similar reference patterns. As we will see, there are also several properties related to the RSGs, and two RSGs should share these properties to be joined. Therefore, besides the number of nodes in an RSG, the number of different RSGs associated with a sentence are limited too. This union of RSGs greatly reduces the number of RSGs and leads to a practicable analysis.

The symbolic execution of the code consists in the abstract interpretation of each sentence in the code. This abstract interpretation is carried out iteratively for each sentence until we reach a fixed point in which the resulting RSRSG associated with the sentence does not change any more [4]. This way, for each sentence that modifies dynamic structures, we have to define the abstract semantics which describe how these sentences modify the RSRSG. We consider six simple instructions that deal with pointers: $ x
= NULL$, $ x = malloc$, $ x = y$, $ x\rightarrow sel = NULL$, $ x\rightarrow sel = y$, and $ x = y\rightarrow sel$. More complex pointer instructions can be built upon these simple ones and temporal variables.

The output RSRSG resulting from the abstract interpretation of a sentence over an input RSRSG$ _i$ is generated by applying the abstract interpretation to each $ rsg_i \in $ RSRSG$ _i$. Due to space constraints we cannot formally describe the abstract semantics of each one of these sentences. However, we intuitively present the modifications involved in an RSG after the sentence symbolic execution:

From a higher perspective, the whole symbolic execution process can be seen by looking at Fig. 2. For each sentence in the code we have an input RSRSG$ _i$ and the corresponding output RSRSG$ _o$ resulting from the modifications which take place in the memory configurations after the sentence execution. As we said, the RSRSG$ _i$ comprises several $ rsg_{ij}$ to capture all the memory configurations associated with each path in the control flow graph. During the symbolic execution of the sentence all these $ rsg_{ij}$ are going to be updated. The first step comprises the graph division and pruning processes after which we obtain several $ rsg_{ijk}$. Then the abstract interpretation of the sentence takes place and usually the complexity of the RSGs grows. In order to counter this effect, the compiler carries out a compression of the graph phase in which each RSG is simplified by the summarization of compatible nodes in the RSG, to obtain the $ rsg^*_{ijk}$ graphs. This step is formally described in Sect. 3.2. Furthermore, some of the $ rsg^*_{ijk}$s can be fused into a single $ rsg_{ok}$ if they represent similar memory configurations. This operation greatly reduces the number of RSGs in the resulting RSRSG. This graph union operation is described in Sect. 4.3.

Figure 2: Schematic description of the symbolic execution of a sentence.

All the operations just enumerated in this section (RSG division, pruning, node summarization, compression of graphs, joining of compatible RSG, etc.) are described in the next two sections. In order to better illustrate these operations, we present a data structure example which will be referred to during the framework and operations description.

next up previous
Next: Working example Up: Progressive Shape Analysis for Previous: Motivation
Rafael Asenjo Plaza 2002-02-19