Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Redhawk is a code navigation system built on the idea of a language agnostic parse tree. It currently supports C & Python.
Code navigation systems are few and far. They are either too tied to a language, or are very heuristic in nature --- using regex based parsers. Redhawk attempts to acheive the best of both worlds. It uses standard, robust, parsers each of the languages, and converts the resulting AST to a language agnostic AST, or LAST.
The resulting LAST can be queried by using either Selectors (similar to JQuery), or an xpath like syntax. A Typical use of Redhawk is as shown below::
$ redhawk query '*/DefineFunction' file1.py file2.c
Redhawk is currently under heavy development. The code can be found on
github
_.
Redhawk currently requires python 2.6 or 2.7.
An introductory set of videos, have been uploaded to Youtube
_.
A Vim plugin
_ released in version 1.1.5, for query, and replace (using an
editable quickfix list).
From Version 1.1.2 onwards, Redhawk supports parallel querying using the parallel-python (pp) module. This speeds up Redhawk's querying on large codebases. Querying for closures anywhere in Django (~2200 files) can now be done in ~20 seconds on a celeron netbook.
(or what's coming up)
Allow users to effectively find and thereby navigate code in an editor-independent manner.
Better documentation for API usage, and a long list of examples, with examples scripts using the Selector API.
Allow cross-language analysis in the future, thereby benefitting projects in multiple languages.
Expose the LAST in a simple manner via the Redhawk API for other tools. These tools could involve indenting code, suggesting completions, or static analysis.
Eventually allow editing of the LAST, and thereby powerful refactoring.
Runtime Dependencies:
pycparser
_ is required to parse C code into ASTs. This
in-turn depends on Python-PLY (python-ply
on debian-ubuntu).Optional but highly recommended Dependencies:
pp
_ - Parallel Python is required for running queries in parallel. This
speeds up queries by more than 2x. This is highly recommended if you are
going to query large projects. The whole of Django can be queried in less
than 20 seconds, by using parallel python (passing -p
to the query
command).
Python Graphviz
_ is required for generating pretty AST graphs. This
package is an optional dependency, but highly recommended. This package goes by the name
python-pygraphviz
on Ubuntu, and depends on graphviz
, and dot
. (Pip
seems to have a hard time install pygraphviz. Either easy_install
or
installing from your distribution's package manager should work).
Development (Compile-time) Dependencies:
Python YAML
_ is required for generating the AST classes in node.py
form a simple configuration file. This goes by the name python-yaml on
debian/ubuntu.
nosetests
_ is required for running the test suite.
Development Cycle:
Use bin/start_simple_bash_with_redhawk_in_pythonpath.sh
Run nosetests from redhawk/ as root. Ensure tests pass.
Make awesome changes!
Submit a pull request to the project on github.
pip
is the recommended tool to install Redhawk. It goes by python-pip
on
debian/ubuntu and pip
_ on the Python Package Index. The command::
$ sudo pip install redhawk
should install redhawk, along with its dependency - pycparser.
It is however recommended that you install the other packages also::
$ sudo easy_install pygraphviz
$ sudo pip install nose 'PyYAML>=3.09' 'nose>=0.11'
or by using your distribution's package manager. On Ubuntu/Debian (Ubuntu Lucid seems to have new enough packages)::
$ sudo aptitude install python-pygraphviz python-yaml python-nose
_build_tables.py
in the pycparser directory, to pre-generate the lex
and yacc tables. This will enable quicker parsing of C files. If pycparser was installed for all users, thenlextab.py
and yacctab.py
must be changed
to allow all users to read (755).You can find more about this here
_.
Redhawk can be used from either the redhawk
executable, or via the redhawk
API.
redhawk
program.import redhawk
redhawk
executableThe redhawk
program supports eight commands:
========= ======================================================= Command Purpose ========= ======================================================= add Add files to an AST index. init Create an EMPTY AST index. listfiles List all the files in the AST index. prompt Drop into a python prompt with helpful functions for exploring the parse tree. query Query for a pattern in a list of files, or in the index. remove Remove files from the AST index. show Show (visualize) a file either as text, or as an image. where Print the location of the current redhawk index (if there is one). ========= =======================================================
The simplest way to run redhawk
is to simply use a query
command on a file
(or directory). The query
command as described above takes an xpath-like
query, and a list of files (or directories), and searches for matches.
In the case that the set of files is large and is to be repeatedly queried, a
redhawk
Language Agnostic Tree (LAST) database can be created using the
redhawk init
command. Files in the project can be added to the database
using the redhawk add
command.
The show
command helps visualise the internal LAST structure used. The
command::
$ redhawk show file.c
will show the LAST of file.c
in a lisp/scheme like (sexp) syntax. A more
descriptive helpful visualisation can be obtained using the -i
(or -e
)
flags, which show graphs (generated using graphviz
using the
python-graphviz
module). This requires the pygraphviz module, an optional
though recommended, dependency. The command::
$ redhawk show file.c -i
shows a graph using the default image python libraries.
The prompt
command drops you into a prompt for exploring and querying the
LAST. This enables the use of selectors, a very powerful method for finding
what you want. For more information on selectors, see::
$ pydoc redhawk.common.selector
for detailed documentation.
The query
command supports an XPATH-like language for querying. We describe
examples below. In querying for a particular construct, the name of that Node
in the LAST has to be known. (Thorough documentation about this is coming up.
For now, one can refer to the node
_ and types
_ yaml configuration files on
github.) [1]_
For the examples below, we shall use the counter.py
_ file. It is to be noted
that the same queries will work with other languages also (only C is supported
for now).::
1 def CounterClosure(init=0):
2 value = [init]
3 def Inc():
4 value[0] += 1
5 return value[0]
6 return Inc
7
8 class CounterClass:
9 def __init__(self, init=0):
10 self.value = init
11
12 def Bump(self):
13 self.value += 1
14 return self.value
15
16 def CounterIter(init = 0):
17 while True:
18 init += 1
19 yield init
20
21 if __name__ == '__main__':
22 c1 = CounterClosure()
23 c2 = CounterClass()
24 c3 = CounterIter()
25 assert(c1() == c2.Bump() == c3.next())
26 assert(c1() == c2.Bump() == c3.next())
27 assert(c1() == c2.Bump() == c3.next())
28
Try redhawk show
on the above file, to get a feel of its structure. You can
view the graphviz generated graph at imgur
_.
Example 1:
Let us find all functions at the module level in counter.py
::
$ redhawk query 'DefineFunction' counter.py
This gives us::
counter.py:16:def CounterIter(init = 0):
counter.py:1:def CounterClosure(init=0):
NOTE:
The results are not necessarily in a sorted order, with respect to
line number. This does not hamper the use of Redhawk for searching and
navigation. (The results will always be guaranteed to be sorted with respect to the
files). On the plus side, this makes Redhawk a little bit faster. If order is
required, a simple invocation of the unix sort
program should fix this.
The above query would work on a C program as well. Running the same query
on stats.c
_ gives us::
stats.c:17:float Variance(float *p, int len) stats.c:5:float Mean(float *p, int len) stats.c:34:int main()
Example 2:
Let us find all functions one level below the module level in counter.py
::
$ redhawk query '*/DefineFunction' counter.py
This gives us::
counter.py:9:def __init__(self, init=0):
counter.py:3:def Inc():
counter.py:12:def Bump(self):
Example 3: Let us find all functions anywhere in the program.::
$ redhawk query '**/DefineFunction' counter.py
This gives us::
counter.py:9:def __init__(self, init=0):
counter.py:16:def CounterIter(init = 0):
counter.py:3:def Inc():
counter.py:1:def CounterClosure(init=0):
counter.py:12:def Bump(self):
Example 4: Suppose we wanted to find all closures in the file. We could do this via::
$ redhawk query '**/DefineFunction/**/DefineFunction' counter.py
This gives us::
counter.py:3:def Inc():
Example 5:
Let us find all functions whose name starts with 'Counter'. Looking at the
node
yaml configuration tells us that DefineFunction
has an argument called
name. Now we simply need to test whether the first 7 letters of the name are
"Counter"::
$ redhawk query '**/DefineFunction@{n.name[:7] == "Counter"}' counter.py
This gives us:
counter.py:16:def CounterIter(init = 0):
counter.py:1:def CounterClosure(init=0):
The @{..}
represents a python lambda function, with the default variable n.
Thus, it is another way of providing arbitrary functions to match with. [2]_
To remind the reader that all these queries are langauge agnostic, running the
above command, but instead search for all functions that have the letter e
in
the them, in the stats.c
_ file.::
$ redhawk query '**/DefineFunction@{n.name.find("e") != -1}' stats.c
gives us::
stats.c:17:float Variance(float *p, int len)
stats.c:5:float Mean(float *p, int len)
Example 7:
Find all assignments where init is involved. Looking again at the node
configuration file, we realise that we are looking for Assignment
Nodes, which
have a ReferVariable
descendent, whose name is 'init'::
$ redhawk query '**/Assignment/**/ReferVariable@[name="init"]' counter.py
This gives us::
counter.py:2:value = [init]
counter.py:18:init += 1
counter.py:10:self.value = init
Note the @[..]
syntax similar to XPATH, for referring to an attribute.
Example 8:
What if we wanted assignments were init was being set, and not referred to? We
would use a code block to look at the lvalue
of the Assignment
.::
$ redhawk query '**/Assignment@{n.lvalue.name == "init"}' counter.py
This gives us::
counter.py:18:init += 1
Example 9:
Let us find all Function calls that start with 'Counter'. Looking again at the
node
_ yaml configuration, we see that we want to find 'CallFunction's, where
the function object has a name starting with "Counter". [3]_ ::
$ redhawk query '**/CallFunction@{n.function.name[:7] == "Counter"}' counter.py
This gives us::
counter.py:24:c3 = CounterIter()
counter.py:22:c1 = CounterClosure()
counter.py:23:c2 = CounterClass()
Example 10
Let us find all Function definitions whose first argument is self
[4]_::
$ redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][0]' counter.py
This gives us::
counter.py:12: def Bump(self):
counter.py:9: def __init__(self, init=0):
The last [0]
is square brackets, indicates the position of that node with
respect to its parent.
Example 11
Let us find all Function definitions whose last argument is self
. The
following query is WRONG::
$ redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][-1]' counter.py
The above query gives us no output. Why? Looking at the node
_ configuration
file, we see that, FunctionArguments
has three children --- arguments
,
var_arguments
, kwd_arguments
, the latter two of which are None
everywhere in the file as no variable or keyword arguments are used. Thus, the
children of FunctionArguments
everywhere in the counter.py
file takes the
form [[..], None, None]
.
What we really want, is the last element of the first element, the arguments
list. This can be expressed as follows [4]_::
$ redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][0, -1]' counter.py
This gives us::
counter.py:12: def Bump(self):
In hindsight, the query in the previous example could have also been expressed as::
$ redhawk query '**/DefineFunction/FunctionArguments/@[name="self"][0, 0]' counter.py
Note: For convenience's sake, even [0, -1, 0]
, or [0, -1, 0, 0, .. , 0]
is
defined to return the same result. Read the 'Position Syntax' section in the
documentation of redhawk.common.xpath
for more information.
An abstract grammar of the query language can be found via::
$ pydoc redhawk.common.xpath
Much more is possible, using the Selector API.
The redhawk
package can also be used as an API by importing
redhawk.common.selector
and related packages. Some of the useful packages
are already imported for the user in redhawk prompt
and are a good place to
start things at.
Example 1: Suppose in the above file we wanted to find all generators, i.e, function definitions, which had a yield as a descendent. We shall see how easy, and logical this query becomes using selectors.
We first go into a redhawk prompt::
$ redhawk prompt counter.py
We are greeted with a help banner::
Built in Variables:
trees - contains the parse trees of the files passed in the command line
Built in Functions:
ConvertFileToAst - Converts a file into a language agnostic AST.
ConvertCodeToAst - Converts a code snippet into a language agnostic AST.
Help - Displays this prompt.
ShowASTAsImage - Shows the AST as a graph using dot.
Built in Modules:
S - redhawk.common.selector
X - redhawk.common.xpath
F - redhawk.common.format_position
To view this again, use the Help() function.
In the prompt, we define our selectors. (See pydoc redhawk.common.selector
for what selectors are, and how they can be composed)::
In [1]: function_def = S.S(node_type='DefineFunction')
In [2]: yield_stmt = S.S(node_type='Yield')
In [3]: reqd_selector = function_def.HasDescendant(yield_stmt)
We then apply the selector on the file. The asts of the files passed are in
the trees argument
. Since this file was the first, it is in trees[0]
::
In [4]: results = list(reqd_selector(trees))
In [5]: results[0]
gives us::
Out[5]: DefineFunction
This is indeed the function we wanted. Just to be sure, we use the
F.PrintContextInFile
function to print the context of the tree.::
In [6]: F.PrintContextInFile(results[0], context=6)
counter.py:10: self.value = init
counter.py:11:
counter.py:12: def Bump(self):
counter.py:13: self.value += 1
counter.py:14: return self.value
counter.py:15:
counter.py:16: > def CounterIter(init = 0):
counter.py:17: while True:
counter.py:18: init += 1
counter.py:19: yield init
counter.py:20:
counter.py:21: if __name__ == '__main__':
counter.py:22: c1 = CounterClosure()
It is easy to see from this example that selectors are highly composable, and thus are very powerful. It is hoped that using selectors becomes a natural way to write powerful custom scripts, for querying code.
Redhawk is distributed under the terms of the 2-clause BSD license. You are free to use it for commercial or non-commercial projects with little or no restriction. For a complete text of the license see the LICENSE.txt file in the source distribution.
v1.2.3
v1.2.2
Support for running Redhawk via Python 3. Great thanks to bkabrda@ and ncoghlan@ for their changes!
Compatibility for new PyCParser changes.
v1.2.1
v1.2.0
v1.1.6
Major internal refactoring involving get_ast.py
Prompt command accepts directories, and can be told not to use IPython.
A new selector function called Apply to make prompt usage easier.
Bug fixes wrt IPython shell and error handling.
v1.1.5
Vim plugin
_ released.
Patch to FormatPosition to not strip lines when context = 0.
v1.1.4
v1.1.3
Bugs in the README's RST syntax fixed.
v1.1.2
Redhawk can now use parallel python (on the same machine), to perform queries on codebases. This speeds up Redhawk (almost) proportionally to the number of cores you have on your computer. Redhawk can now query for closures in Django in just ~20 seconds.
Friendlier usage strings and help messages.
v1.1.1
Python2.7 compatibility: ast.parse (Thanks to Nafai77)
Profiled, performance improvements by 15% by shifting to deque, and caching flattened children.
Provided a bin/start_simple_bash_with_redhawk_in_pythonpath.sh to enter a temporary shell with redhawk in PYTHONPATH (for devs).
v1.1.0
Fast enough to work on Django - Querying DefineClass anywhere in the codebase (~2300 python files), takes just 45 seconds on a celeron netbook. Thats 19ms per file!
Uses the shelve module instead of the pickle module, to decrease read and write times for the redhawk database.
Redhawk supports three new commands - listfiles
, remove
, where
The query
, and show
, commands take an extra argument -s
, to decide if
new trees should be added to the database.
Skip a file if there is a parser error.
.. [1] ast_gen.py
_ generates node.py
_ and types.py
_ using these YAML configuration files.
.. [2] In fact the portion inside the @{..}
is just appended to a 'lambda n:' and eval
-ed to get a function.
.. [3] Note that 'CallFunction's do not directly have a name. This is because the function object, unlike that of a function definition, can be a value. It is possible to do (f.g[x])(y), and such.
.. [4] These queries actually finds us the argument, and not the function itself. But this shouldn't matter when we have the definition on the same line.
.. _Vim plugin: http://www.vim.org/scripts/script.php?script_id=3586 .. _imgur: http://imgur.com/CBHCX .. _counter.py: https://github.com/spranesh/Redhawk/tree/master/redhawk/test/files/examples/counter.py .. _stats.c: https://github.com/spranesh/Redhawk/tree/master/redhawk/test/files/examples/stats.c .. _ast_gen.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_ast_gen.py .. _node.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/node.py .. _types.py: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/types.py .. _node: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_node_cfg.yaml .. _types: https://github.com/spranesh/Redhawk/blob/master/redhawk/common/_types_cfg.yaml .. _here: http://pycparser.googlecode.com/hg/README.html#installation-process .. _pip: http://pypi.python.org/pypi/pip .. _github: http://www.github.com/spranesh/Redhawk .. _Python Graphviz: http://networkx.lanl.gov/pygraphviz/ .. _pycparser: http://code.google.com/p/pycparser/ .. _pp: http://pypi.python.org/pypi/pp .. _Python YAML: http://www.pyyaml.org .. _nosetests: http://somethingaboutorange.com/mrl/projects/nose/1.0.0/ .. _Youtube: http://www.youtube.com/watch?v=azaXpahrxww
FAQs
An AST based navigation system.
We found that redhawk demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.