next up previous
Next: Reduced Set of RSGs Up: Reference Shape Graph Previous: Node properties


Compression of graphs

As we explained in Sect. 2 the code analysis implies symbolically executing each sentence applying the abstract semantics associated with this sentence. This symbolic execution modifies the RSGs included in the input RSRSG. This way, these graphs may contain new nodes representing new memory locations and/or the properties for certain nodes may have changed. In other words, after the symbolic execution of a sentence over an input RSRSG, the resulting RSRSG may contain RSGs with redundant information, which can be removed due to node summarization or compression of the RSG.

In order to do this, after the symbolic execution of a sentence, the method applies the COMPRESS function over the just modified RSGs. However, before explaining this COMPRESS function, we need to define the C_NODES_RSG one, which identifies the compatible nodes that will later be summarized. This Boolean function just has to check whether or not the first six properties previously described are the same for both nodes (as we said in the previous subsection, the CYCLELINKS property does not affect the compatibility of two nodes). This way

C_NODES_RSG $ (n_i, n_j) = true ~$ if  (TYPE$ (n_i)$ = TYPE $ (n_j)) ~ \wedge ~$(STRUCTURE$ (n_i)$ = STRUCTURE $ (n_j)) ~ \wedge ~$
(SHARED$ (n_i)$ = SHARED $ (n_j)) ~ \wedge ~$(SHSEL $ (n_i, sel_k)$ = SHSEL $ (n_j, sel_k) \forall sel_k \in S) ~ \wedge ~$(TOUCH$ (n_i)$ = TOUCH $ (n_j)) ~ \wedge ~$
(C_REFPAT $ (n_i, n_j) = 1) ~ \wedge ~$(C_SPATH $ (n_i,
n_j, m) = 1).$

There is a similar function which returns true when two memory locations are compatible. With this, we can finally define $ F_n$ as the function which maps all the compatible memory locations into the same node, which happens when they have the same TYPE, STRUCTURE, SHARED and TOUCH properties, and compatible REFPAT, and SPATH properties.

Now, compatible nodes of the same RSG are summarized by the function COMPRESS $ (rsg) = rsg_c$ which does not change the set of pvars, $ P$, nor the set of selectors, $ S$, in the new $ rsg_c$. However, the other sets involved in the new $ rsg_c$ composition are:

$ \bullet$ The set of nodes of the compressed RSG, $ N(rsg_c)$, will contain the nodes which cannot be summarized with any other plus the nodes resulting from the summarization of compatible nodes. We use the MERGE_COMP_NODES function to generate a summary node from a group of compatible nodes, as we will see later. Formally:

$N(rsg_c) = \{n ~\vert~ [n \in N(rsg) \wedge (\nexists n_i \in N(rsg)) ~
\wedge ~\text{\tt C\_NODES\_RSG}(n, n_i) = 1)]~ \vee$
$[n =
\text{\tt MERGE\_COMP\_NODES}(n_1,...,n_k), n_1,...,n_k \in N(rsg) ~
\wedge ~(\forall i=1..k-1, \text{\tt C\_NODES\_RSG}(n_i, n_{i+1}) =
1)]\}$

$ \bullet$ The new set of pvar references, $ PL(rsg_c)$, is basically the same set of the uncompressed RSG, $ PL(rsg)$, but which maps all the nodes into the new ones. This is done with the $ n_c=$ MAP_RSG$ (n)$ function which maps the old node $ n \in
N(rsg)$ into the new node $ n_c \in N(rsg_c)$:

$PL(rsg_c) = \{<pvar, \text{\tt MAP\_RSG}(n)>, \forall <pvar, n> \in PL(rsg)\}$

$ \bullet$ The same idea applies to the set of references for the compressed graph:

$NL(rsg_c) = \{<\text{\tt MAP\_RSG}(n_i), sel, \text{\tt MAP\_RSG}(n_j)>, \forall <n_i, sel, n_j> \in NL(rsg)\}$

Regarding the MERGE_COMP_NODES function used for the summarization of several compatible nodes, we define: MERGE_COMP_NODES $(n_1,...n_k) = \text{\tt MERGE\_NODES}(n_1,
\text{\tt MERGE\_NODES}(n_2, ..., \text{\tt MERGE\_NODES}(n_{k-1},
n_k)...))$

The MERGE_NODES$ (n_1, n_2)$ function takes care of the summarization of nodes $ n_1$ and $ n_2$ into node $ n$. The properties of this new node, $ n$, are set in order to preserve the description of the data structures represented by the original nodes. This way, MERGE_NODES $ (n_1, n_2) = n$, where:
TYPE$ (n)$ = TYPE$ (n_1)$ = TYPE$ (n_2)$ PosSELOUTset$ (n)$ = PosSELOUTset$ (n_1)$ $ \cup$ PosSELOUTset$ (n_2)$
SHARED$ (n)$ = SHARED$ (n_1)$ = SHARED$ (n_2)$ PosSELINset$ (n)$ = PosSELINset$ (n_1)$ $ \cup$ PosSELINset$ (n_2)$
TOUCH$ (n)$ = TOUCH$ (n_1)$ = TOUCH$ (n_2)$ SHSEL $ (n, sel_i)$ = SHSEL $ (n_1, sel_i)$ = SHSEL $ (n_2, sel_i) \forall sel_i \in S$
SELINset$ (n)$ = SELINset$ (n_1)$ $ \cap$ SELINset$ (n_2)$ SELOUTset$ (n)$ = SELOUTset$ (n_1)$ $ \cap$ SELOUTset$ (n_2)$

CYCLELINKS $ (n) = \{<sel_i, sel_j> \vert \\
\phantom{.}\qquad \qquad \left\{\begin{array}...
...dge \nexists n_k \in N(rsg), <n_1, sel_i, n_k> \in
NL(rsg) \end{array}\right . $

This means that the new node will have the same TYPE, SHARED, SHSEL, and TOUCH properties which actually should be the same in $ n_1$ and $ n_2$ to allow the summarization of both nodes. However, the new reference pattern information behaves conservatively. If $ sel_i$ is an input/output selector in both nodes, $ n_1$ and $ n_2$, then it will remain as an input/output selector in the resulting summarized node, $ n$. The same happens if $ sel_i$ is not an input/output selector in both nodes. However, if we are not sure about the $ sel_i$ selector in one of the original nodes, then the same uncertainty exists regarding the resulting summarized node.

Finally, regarding the CYCLELINKS sets, the resulting node $ n$ keeps the common cycle links sets from the original nodes, $ n_1$ and $ n_2$. In addition, a cycle link, $ <sel_i,
sel_j>$, from one of the nodes, for example, $ n_1$, is also included in the cycle link set of the node $ n$ if the first selector $ sel_i$ is not a link selector in the other node, $ n_2$.


next up previous
Next: Reduced Set of RSGs Up: Reference Shape Graph Previous: Node properties
Rafael Asenjo Plaza 2002-02-19