Package pointer implements Andersen's analysis, an inclusion-based
pointer analysis algorithm first described in (Andersen, 1994).
A pointer analysis relates every pointer expression in a whole program
to the set of memory locations to which it might point. This
information can be used to construct a call graph of the program that
precisely represents the destinations of dynamic function and method
calls. It can also be used to determine, for example, which pairs of
channel operations operate on the same channel.
The package allows the client to request a set of expressions of
interest for which the points-to information will be returned once the
analysis is complete. In addition, the client may request that a
callgraph is constructed. The example program in example_test.go
demonstrates both of these features. Clients should not request more
information than they need since it may increase the cost of the
analysis significantly.
Our algorithm is INCLUSION-BASED: the points-to sets for x and y will
be related by pts(y) ⊇ pts(x) if the program contains the statement
y = x.
It is FLOW-INSENSITIVE: it ignores all control flow constructs and the
order of statements in a program. It is therefore a "MAY ALIAS"
analysis: its facts are of the form "P may/may not point to L",
not "P must point to L".
It is FIELD-SENSITIVE: it builds separate points-to sets for distinct
fields, such as x and y in struct { x, y *int }.
It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once,
so values can flow in at one call to the function and return out at
another. Only some smaller functions are analyzed with consideration
of their calling context.
It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation
site and context, so the objects returned by two distinct calls to f:
are distinguished up to the limits of the calling context.
It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the
complete Go program and summaries for native code.
See the (Hind, PASTE'01) survey paper for an explanation of these terms.
The analysis is fully sound when invoked on pure Go programs that do not
use reflection or unsafe.Pointer conversions. In other words, if there
is any possible execution of the program in which pointer P may point to
object O, the analysis will report that fact.
By default, the "reflect" library is ignored by the analysis, as if all
its functions were no-ops, but if the client enables the Reflection flag,
the analysis will make a reasonable attempt to model the effects of
calls into this library. However, this comes at a significant
performance cost, and not all features of that library are yet
implemented. In addition, some simplifying approximations must be made
to ensure that the analysis terminates; for example, reflection can be
used to construct an infinite set of types and values of those types,
but the analysis arbitrarily bounds the depth of such types.
Most but not all reflection operations are supported.
In particular, addressable reflect.Values are not yet implemented, so
operations such as (reflect.Value).Set have no analytic effect.
The pointer analysis makes no attempt to understand aliasing between the
operand x and result y of an unsafe.Pointer conversion:
It is as if the conversion allocated an entirely new object:
The analysis cannot model the aliasing effects of functions written in
languages other than Go, such as runtime intrinsics in C or assembly, or
code accessed via cgo. The result is as if such functions are no-ops.
However, various important intrinsics are understood by the analysis,
along with built-ins such as append.
The analysis currently provides no way for users to specify the aliasing
effects of native code.
------------------------------------------------------------------------
The remaining documentation is intended for package maintainers and
pointer analysis specialists. Maintainers should have a solid
understanding of the referenced papers (especially those by H&L and PKH)
before making making significant changes.
The implementation is similar to that described in (Pearce et al,
PASTE'04). Unlike many algorithms which interleave constraint
generation and solving, constructing the callgraph as they go, this
implementation for the most part observes a phase ordering (generation
before solving), with only simple (copy) constraints being generated
during solving. (The exception is reflection, which creates various
constraints during solving as new types flow to reflect.Value
operations.) This improves the traction of presolver optimisations,
but imposes certain restrictions, e.g. potential context sensitivity
is limited since all variants must be created a priori.
A type is said to be "pointer-like" if it is a reference to an object.
Pointer-like types include pointers and also interfaces, maps, channels,
functions and slices.
We occasionally use C's x->f notation to distinguish the case where x
is a struct pointer from x.f where is a struct value.
Pointer analysis literature (and our comments) often uses the notation
dst=*src+offset to mean something different than what it means in Go.
It means: for each node index p in pts(src), the node index p+offset is
in pts(dst). Similarly *dst+offset=src is used for store constraints
and dst=src+offset for offset-address constraints.
Nodes are the key datastructure of the analysis, and have a dual role:
they represent both constraint variables (equivalence classes of
pointers) and members of points-to sets (things that can be pointed
at, i.e. "labels").
Nodes are naturally numbered. The numbering enables compact
representations of sets of nodes such as bitvectors (or BDDs); and the
ordering enables a very cheap way to group related nodes together. For
example, passing n parameters consists of generating n parallel
constraints from caller+i to callee+i for 0<=i<n.
The zero nodeid means "not a pointer". For simplicity, we generate flow
constraints even for non-pointer types such as int. The pointer
equivalence (PE) presolver optimization detects which variables cannot
point to anything; this includes not only all variables of non-pointer
types (such as int) but also variables of pointer-like types if they are
always nil, or are parameters to a function that is never called.
Each node represents a scalar part of a value or object.
Aggregate types (structs, tuples, arrays) are recursively flattened
out into a sequential list of scalar component types, and all the
elements of an array are represented by a single node. (The
flattening of a basic type is a list containing a single node.)
Nodes are connected into a graph with various kinds of labelled edges:
simple edges (or copy constraints) represent value flow. Complex
edges (load, store, etc) trigger the creation of new simple edges
during the solving phase.
Conceptually, an "object" is a contiguous sequence of nodes denoting
an addressable location: something that a pointer can point to. The
first node of an object has a non-nil obj field containing information
about the allocation: its size, context, and ssa.Value.
Objects include:
Many objects have no Go types. For example, the func, map and chan type
kinds in Go are all varieties of pointers, but their respective objects
are actual functions (executable code), maps (hash tables), and channels
(synchronized queues). Given the way we model interfaces, they too are
pointers to "tagged" objects with no Go type. And an *ssa.Global denotes
the address of a global variable, but the object for a Global is the
actual data. So, the types of an ssa.Value that creates an object is
"off by one indirection": a pointer to the object.
The individual nodes of an object are sometimes referred to as "labels".
For uniformity, all objects have a non-zero number of fields, even those
of the empty type struct{}. (All arrays are treated as if of length 1,
so there are no empty arrays. The empty tuple is never address-taken,
so is never an object.)
An tagged object has the following layout:
The T node's typ field is the dynamic type of the "payload": the value
v which follows, flattened out. The T node's obj has the otTagged
flag.
Tagged objects are needed when generalizing across types: interfaces,
reflect.Values, reflect.Types. Each of these three types is modelled
as a pointer that exclusively points to tagged objects.
Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that
the value v is not of type T but *T; this is used only for
reflect.Values that represent lvalues. (These are not implemented yet.)
Variables of the following "scalar" types may be represented by a
single node: basic types, pointers, channels, maps, slices, 'func'
pointers, interfaces.
Pointers
Basic types (bool, string, numbers, unsafe.Pointer)
Channels
Maps
Slices
Functions
Interfaces
reflect.Value
reflect.Type
Aggregate types:
Aggregate types are treated as if all directly contained
aggregates are recursively flattened out.
Structs
Arrays
Tuples (T, ...)
FUNCTION CALLS
We implement Hash-Value Numbering (HVN), a pre-solver constraint
optimization described in Hardekopf & Lin, SAS'07. This is documented
in more detail in hvn.go. We intend to add its cousins HR and HU in
future.
The solver is currently a naive Andersen-style implementation; it does
not perform online cycle detection, though we plan to add solver
optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf
& Lin, PLDI'07).
It uses difference propagation (Pearce et al, SQC'04) to avoid
redundant re-triggering of closure rules for values already seen.
Points-to sets are represented using sparse bit vectors (similar to
those used in LLVM and gcc), which are more space- and time-efficient
than sets based on Go's built-in map type or dense bit vectors.
Nodes are permuted prior to solving so that object nodes (which may
appear in points-to sets) are lower numbered than non-object (var)
nodes. This improves the density of the set over which the PTSs
range, and thus the efficiency of the representation.
Partly thanks to avoiding map iteration, the execution of the solver is
100% deterministic, a great help during debugging.
Andersen, L. O. 1994. Program analysis and specialization for the C
programming language. Ph.D. dissertation. DIKU, University of
Copenhagen.
David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient
field-sensitive pointer analysis for C. In Proceedings of the 5th ACM
SIGPLAN-SIGSOFT workshop on Program analysis for software tools and
engineering (PASTE '04). ACM, New York, NY, USA, 37-42.
http://doi.acm.org/10.1145/996821.996835
David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online
Cycle Detection and Difference Propagation: Applications to Pointer
Analysis. Software Quality Control 12, 4 (December 2004), 311-337.
http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2
David Grove and Craig Chambers. 2001. A framework for call graph
construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6
(November 2001), 685-746.
http://doi.acm.org/10.1145/506315.506316
Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast
and accurate pointer analysis for millions of lines of code. In
Proceedings of the 2007 ACM SIGPLAN conference on Programming language
design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299.
http://doi.acm.org/10.1145/1250734.1250767
Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location
equivalence to optimize pointer analysis. In Proceedings of the 14th
international conference on Static Analysis (SAS'07), Hanne Riis
Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg,
265-280.
Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution
for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000
conference on Programming language design and implementation (PLDI '00).
ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310
http://doi.acm.org/10.1145/349299.349310
This program demonstrates how to use the pointer analysis to
obtain a conservative call-graph of a Go program.
It also shows how to compute the points-to set of a variable,
in this case, (C).f's ch parameter.