Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

csquare-cast

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

csquare-cast

  • 0.2.2
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

= CAST

http://cast.rubyforge.org

== Description

CAST parses C code into an abstract syntax tree (AST), lets you break it, then vomit it out as code. The parser does C99.

This fork supports Ruby 1.9.3, gemspec, and requires Hoe. The Rubyforge page above is documentation for the original version, but most things should be the same.

== Installation

gem install csquare-cast

== Library Overview

Everything is in the module C.

  • There's the parser (Parser).
  • There's the tree (Node and its subclasses).
  • That's it.

=== Usage

You call Parser#parse, and it gives you a tree of Node objects. Watch:

require 'cast/cast'

# create a parser
parser = C::Parser.new

# (optional) set some settings...
parser.pos.filename = "toy.c"      # used for error messages
parser.type_names << 'LinkedList'  # treat these words as types

# gimme a tree
ugly_c_code = open("toy.c"){|f| f.read}
tree = parser.parse(ugly_c_code)

# what's the tree look like?
p tree

If there's a parse error, #parse raises a ParseError (which has a nice error message in #message).

== The Parser

Here's a quiz: what does "a * b;" do?

I bet you said "why you l4m3r n00b, that's a statement that multiplies a by b and throws away the answer -- now go take your meaningless snippetage to your computing 101 class and let me finish hurting this JavaTM programmer." Well, you'd be both mean and wrong. It was, of course, a trick question. I didn't say if any of a and b are types! If only a is a type, it's actually a declaration. And if b is a type, it's a syntax error.

So, the parser's gonna need to know which identifiers are type names. This is one of the bits of state that a Parser keeps. Here's the complete list (um, of two):

  • #type_names -- a Set of Strings.
  • #pos -- the Node::Pos this parser will start parsing at.

A Node::Pos has three read-write atts: #filename, #line_num, #col_num. Default is nil, 1, 0.

== The Nodes

There are a couple of Node classes:

  • Node
    • TranslationUnit
    • Declaration
    • Declarator
    • FunctionDef
    • Parameter
    • Enumerator
    • MemberInit
    • Member
    • Statement
      • Block
      • If
      • Switch
      • While
      • For
      • Goto
      • Continue
      • Break
      • Return
      • ExpressionStatement
    • Label
      • PlainLabel
      • Default
      • Case
  • Node
    • Expression
      • Comma
      • Conditional
      • Variable
      • UnaryExpression
        • PostfixExpression
          • Index
          • Call
          • Dot
          • Arrow
          • PostInc
          • PostDec
        • PrefixExpression
          • Cast
          • Address
          • Dereference
          • Sizeof
          • Plus
          • Minus
          • PreInc
          • PreDec
          • BitNot
          • Not
  • Node
    • Expression
      • BinaryExpression
        • Add
        • Subtract
        • Multiply
        • Divide
        • Mod
        • Equal
        • NotEqual
        • Less
        • More
        • LessOrEqual
        • MoreOrEqual
        • BitAnd
        • BitOr
        • BitXor
        • ShiftLeft
        • ShiftRight
        • And
        • Or
  • Node
    • Expression
      • AssignmentExpression
        • Assign
        • MultiplyAssign
        • DivideAssign
        • ModAssign
        • AddAssign
        • SubtractAssign
        • ShiftLeftAssign
        • ShiftRightAssign
        • BitAndAssign
        • BitXorAssign
        • BitOrAssign
      • Literal
        • StringLiteral
        • CharLiteral
        • CompoundLiteral
        • IntLiteral
        • FloatLiteral
  • Node
    • Type
      • IndirectType
        • Pointer
        • Array
        • Function
      • DirectType
        • Struct
        • Union
        • Enum
        • CustomType
        • PrimitiveType
          • Void
          • Int
          • Float
          • Char
          • Bool
          • Complex
          • Imaginary
    • NodeList
      • NodeArray
      • NodeChain

The bold ones are abstract.

The last 2 (the NodeLists) represent lists of nodes. Methodwise, they try to behave like normal ruby ::Arrays. Implementationwise, a NodeChain is a doubly linked list, whereas a NodeArray is an array. NodeChains may be more efficient when adding things at the beginning of a LARGE list.

=== Attributes

Each Node object has:

  • #parent -- return the parent in the tree (a Node or nil).
  • #pos, #pos= -- the position in the source file (a Node::Pos).
  • #to_s -- return the code for the tree (a String).
  • #inspect -- return a pretty string for inspection. Try it.
  • #match?(str), #=~(str) -- return true iff str parses as a Node equal to this one.
  • #detach -- remove this node from the tree (parent becomes nil) and return it.
  • #detached?, #attached? -- return true if parent is nil or non-nil respectively.
  • #replace_with(node) -- replace this node with node in the tree.
  • #swap_with(node) -- exchange this node with node in their trees.
  • #insert_prev(*nodes), #insert_next(*nodes) -- insert nodes before this node in the parent list. Parent must be a NodeList! Useful for adding statements before a node in a block, for example.
  • #Foo? -- (where Foo is a module name) return self.is_a?(Foo).

The #Foo? method is a convienience for a common need. Example:

## make a tree
ast = C::Parser.new.parse(code_string)

## print all global variables
ast.entities.each do |node|
  node.Declaration? or next
  node.declarators.each do |decl|
    unless decl.type.Function?
      puts "#{decl.name}: #{decl.type}"
    end
  end
end

If you're a duck-typing purist, then sorry for the cardiac arrest you're now experiencing. CAST does pay attention to the class of Node objects for quite a few things. This is the cleanest way to distinguish, e.g., an Add from a Subtract (which both have the same methods but represent very different things). It just seems impractical (and unnecessary) to allow duck typing in this situation.

The #=~ method lets you do:

if declarator.type =~ 'const int *'
  puts "Ooh, a const int pointer!"
end

This is not the same as

declarator.type.to_s == 'const int *'

That'd require you to guess how to_s formats its strings (most notably, the whitespace).

=== Fields and children

Each concrete Node class has a member for each bit of important C stuff it pertains to. I know you peeked at the big list below, so you know the kind of thing I mean.

But these aren't defined as attrs as you normally do in Ruby -- they're fields. If a node has a field foo, it means there's a setter #foo= and getter #foo. (A field foo? means the setter is #foo= and the getter is #foo?.) Some fields are even more special: child fields. These form the tree structure, and always have a Node or nil value.

Why divulge these bizarre internal secrets? Because these Node methods are defined in terms of fields and children:

  • #==, #eql? -- Equality is checked recursively. That is, all fields (including children) must be equal.
  • #dup, #clone -- Children are copied recursively (other fields and attrs as normal).

Then there's the tree-twiddling methods, which only ever yield/return/affect (non-nil) children.

  • #next, #prev -- return the next/prev sibling.
  • #list_next, #list_prev -- like #next/#prev, but also requires the parent to be NodeList. I'll be honest; I don't remember why I added these methods. They may well suddenly disappear.
  • #each, #reverse_each -- Yield all (non-nil) children. Node includes Enumerable, so you get the free chocolates too.
  • #depth_first, #reverse_depth_first -- Walk the tree in that order, yielding two args (event, node) at each node. event is :down on the way down, :up on the way up. If the block throws :prune, it won't descend any further.
  • #preorder, #reverse_preorder, #postorder, #reverse_postorder -- Walk the tree depth first, yielding nodes in the given order. For the preorders, if the block throws :prune, it won't descend any further.
  • #node_after(child), #node_before(child) -- return the node before/after child (same as child.next).
  • #remove_node(child) -- remove child from this node (same as child.detach).
  • #replace_node(child, new_child) -- replace child with yeah you guessed it (same as child.replace_with(newchild)).

If you're walking the tree looking for nodes to move around, don't forget that modifying the tree during traversal is a criminal offence.

And now, the episode you've been waiting for: THE FIELD LIST! (Cue music and dim lights.)

Notes about the table:

  • If no default is listed, it is false if the field name ends in a '?', nil otherwise.
  • nil is always allowed for a child field.
  • There are some conventions we follow religiously to help you:
    • Field names that end in '?' are always true-or-false.
    • NodeList-valued fields always default to an empty NodeArray or NodeChain, so you can tack things on there with <<, without worrying about needing to create the list first.
    • A field is Node-valued if and only if it is a child field.
    • The rows go yellow, green, yellow, green, ... .
ClassFieldChild?Type or possible valuesDefaultComments
TranslationUnitentitiesYNodeListNodeChain[] He lives at the root of a parsed file.
Declarationstorage:typedef, :extern, :static, :auto, :register There are also methods to query the storage more humanely:
  • #typedef? -- true iff storage == :typedef
  • #extern? -- true iff storage == :extern
  • #static? -- true iff storage == :static
  • #auto? -- true iff storage == :auto
  • #register? -- true iff storage == :register
typeYDirectType
declaratorsYNodeListNodeArray[]
inline?true, false
Declaratorindirect_typeYIndirectType What on earth is a "declarator", you ask? Consider "int i, *ip;". This is a Declaration with two Declarators:
    Declaration
        type: Int
        declarators:
            - Declarator
                name: "i"
            - Declarator
                indirect_type: Pointer
                name: "ip"
The indirect_type of the ip Declarator is a Pointer to nil. "'Pointer to nil' my foot -- I want the type of the stupid variable!" Here:
  • #type -- return the type, the whole type, and nothing but the type. This is a clone; modifying it won't modify the tree.
So calling #type on the ip Declarator gives:
    Pointer
        type: Int
nameString
initYExpression
num_bitsYInteger
FunctionDefstorage:extern, :static Just like Declaration, there's also:
  • #extern? -- return true iff storage == :extern
  • #static? -- return true iff storage == :static
There's also a pseudo-field:
  • #prototype? -- same as !no_prototype?
  • #prototype=(val) -- same as no_prototype = !val
no_prototype? means that no prototype was given. That means parameter types weren't given in the parens, but in the "old-style" declaration list. Example:
int main(argc, argv)
    int argc;
    char **argv;
{
    return 0;
}
int main(int argc, char **argv) {
    return 0;
}
No prototype. Prototype.
Everyone tells you to use prototypes. That's because no type checking is done when calling a function declared without a prototype.
inline?true, false
typeYType
nameString
defYBlockBlock.new
no_prototype?true, false
Parameterregister?true, false Used in Functions.
typeYType
nameString
EnumeratornameString Used in Enums.
valYExpression
MemberInitmemberYNodeList of Member-or-Expression Used in CompoundLiterals.
initYExpression
MembernameString Used in MemberInits.
BlocklabelsYNodeList of LabelNodeArray[]
stmtsYNodeList of StatementNodeArray[]
IflabelsYNodeList of LabelNodeArray[]
condYExpression
thenYStatement
elseYStatement
SwitchlabelsYNodeList of LabelNodeArray[]
condYExpression
stmtYStatement
WhilelabelsYNodeList of LabelNodeArray[] do? means it's a do-while loop.
do?true, false
condYExpression
stmtYStatement
ForlabelsYNodeList of LabelNodeArray[]
initYExpression or Declaration
condYExpression
iterYExpression
stmtYStatement
GotolabelsYNodeList of LabelNodeArray[]
targetString
ContinuelabelsYNodeList of LabelNodeArray[]
BreaklabelsYNodeList of LabelNodeArray[]
ReturnlabelsYNodeList of LabelNodeArray[]
exprYExpression
ExpressionStatementlabelsYNodeList of LabelNodeArray[]
exprYExpression
PlainLabelnameString
Default
CaseexprYExpression
CommaexprsYNodeList of Expression
ConditionalcondYExpression
thenYExpression
elseYExpression
VariablenameString
IndexexprYExpression
indexYExpression
CallexprYExpression
argsYNodeList of Expression-or-Type
DotexprYExpression
memberYString
ArrowexprYExpression
memberYString
PostIncexprYExpression
PostDecexprYExpression
CasttypeYType
exprYExpression
AddressexprYExpression
DereferenceexprYExpression
SizeofexprYType or Expression
PositiveexprYExpression
NegativeexprYExpression
PreIncexprYExpression
PreDecexprYExpression
BitNotexprYExpression
NotexprYExpression
Addexpr1YExpression
expr2YExpression
Subtractexpr1YExpression
expr2YExpression
Multiplyexpr1YExpression
expr2YExpression
Divideexpr1YExpression
expr2YExpression
Modexpr1YExpression
expr2YExpression
Equalexpr1YExpression
expr2YExpression
NotEqualexpr1YExpression
expr2YExpression
Lessexpr1YExpression
expr2YExpression
Moreexpr1YExpression
expr2YExpression
LessOrEqualexpr1YExpression
expr2YExpression
MoreOrEqualexpr1YExpression
expr2YExpression
BitAndexpr1YExpression
expr2YExpression
BitOrexpr1YExpression
expr2YExpression
BitXorexpr1YExpression
expr2YExpression
ShiftLeftexpr1YExpression
expr2YExpression
ShiftRightexpr1YExpression
expr2YExpression
Andexpr1YExpression
expr2YExpression
Orexpr1YExpression
expr2YExpression
AssignlvalYExpression
rvalYExpression
MultiplyAssignlvalYExpression
rvalYExpression
DivideAssignlvalYExpression
rvalYExpression
ModAssignlvalYExpression
rvalYExpression
AddAssignlvalYExpression
rvalYExpression
SubtractAssignlvalYExpression
rvalYExpression
ShiftLeftAssignlvalYExpression
rvalYExpression
ShiftRightAssignlvalYExpression
rvalYExpression
BitAndAssignlvalYExpression
rvalYExpression
BitXorAssignlvalYExpression
rvalYExpression
BitOrAssignlvalYExpression
rvalYExpression
StringLiteralvalString The String in val is the literal string entered. "\n" isn't converted to a newline, for instance.
CharLiteralvalString The String in val is the literal string entered. '\n' isn't converted to a newline, for instance.
CompoundLiteraltypeYType

Here's an example. (struct S){1, .x = 2, .y [3] .z = 4} parses as:

CompoundLiteral
    type: Struct
        name: "S"
    member_inits:
        - MemberInit
            init: IntLiteral
                val: 1
        - MemberInit
            member:
                - Member
                    name: "x"
            init: IntLiteral
                val: 2
        - MemberInit
            member:
                - Member
                    name: "y"
                - IntLiteral
                    val: 3
                - Member
                    name: "z"
            init: IntLiteral
                val: 4
"That's legal syntax!?" Yep. Look it up.
member_initsYNodeList of MemberInitNodeArray[]
IntLiteralvalInteger

Also:

  • #dec? -- return true iff format == :dec
  • #hex? -- return true iff format == :hex
  • #oct? -- return true iff format == :oct
format:dec, :hex, :oct:dec
FloatLiteralvalFloat
Pointerconst?true, false
restrict?true, false
volatile?true, false
typeYType
Arrayconst?true, false
restrict?true, false
volatile?true, false
typeYType
lengthYExpression
Functionconst?true, false
restrict?true, false
volatile?true, false
typeYType
paramsYNodeList of ParameterNodeArray[]
var_args?true, false
Structconst?true, false
restrict?true, false
volatile?true, false
nameString
membersYNodeList of MemberNodeArray[]
Unionconst?true, false
restrict?true, false
volatile?true, false
nameString
membersYNodeList of MemberNodeArray[]
Enumconst?true, false
restrict?true, false
volatile?true, false
nameString
membersYNodeList of Enumerator
CustomTypeconst?true, false This is for typedef'd names.
restrict?true, false
volatile?true, false
nameString
Voidconst?true, false const void!? Yes, think about: const void *.
restrict?true, false
volatile?true, false
Intconst?true, false longness sounds silly, so here are some less silly methods:
  • #short? -- return true iff longness == -1
  • #plain? -- return true iff longness == 0
  • #long? -- return true iff longness == 1
  • #long_long? -- return true iff longness == 2
Oh, and look, a pseudo-field:
  • #signed? -- same as !unsigned?
  • #signed=(val) -- same as unsigned = !val
restrict?true, false
volatile?true, false
longness-1, 0, 1, 20
unsigned?true, false
Floatconst?true, false Less silly-sounding longness substitutes:
  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict?true, false
volatile?true, false
longness0, 1, 20
Charconst?true, false Also:
  • #signed? -- return true iff signed == true
  • #unsigned? -- return true iff signed == false
  • #plain? -- return true iff signed == nil
Yes, C99 says that char, signed char, and unsigned char are 3 distinct types (unlike with int -- go figure). Like Martian chalk and Venusian cheese: completely different, but you can fit 'em each in one byte. Mmm, Martian chalk...
restrict?true, false
volatile?true, false
signedtrue, false, nil
Boolconst?true, false This is the rarely seen _Bool type.
restrict?true, false
volatile?true, false
Complexconst?true, false

This is the rarely seen _Complex type.

  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict?true, false
volatile?true, false
longness0, 1, 20
Imaginaryconst?true, false

This is the rarely seen _Imaginary type.

  • #plain? -- return true iff longness == 0
  • #double? -- return true iff longness == 1
  • #long_double? -- return true iff longness == 2
restrict?true, false
volatile?true, false
longness0, 1, 20
BlockExpressionblockYBlockBlock.new Only if the block_expressions extension is enabled. See "Extensions" section below.

=== Node Construction

Wanna make a Node? Take your pick:

  • #new(field1, field2, ...) -- fields are in the order listed above.
  • #new(:field1 => val, :field2 => val, ...) -- field order doesn't matter this way.
  • #new_at(pos, *args) -- set the pos to that given, and pass args to #new.

They're for losers, though. What you really want to do is make Nodes by parsing C code. Each class -- even the abstract classes like Statement -- has a .parse method:

function_def = C::FunctionDef.parse <<EOS
  void frobnicate(int karma) {
    use_waffle_iron();
  }
stmt = C::Statement.parse('while (not_looking) paint_car();')

Need to tell it to treat WaffleIron as a type name? All those parse methods use C.default_parser:

C.default_parser.type_names << 'WaffleIron'
type = C::Type.parse('WaffleIron')

Alternatively, you could've given parse your own parser:

parser = C::Parser.new
parser.type_names << 'WaffleIron'
type = C::Type.parse('WaffleIron', parser)

In fact, there's also C.parse(str, parser=nil), which is an alias for C::TranslationUnit.parse(str, parser).

ast = C.parse(STDIN)

Yes, all that talk in the intro about doing parser = C::Parser.new; parser.parse(...) was actually all a charade to make you type more. I so own you.

== Extensions

CAST has developed extensions! To the C99 grammar, I mean.

  • Types are allowed as function arguments. This lets you parse macros like va_arg().

  • Blocks in parentheses are allowed as expressions ((a gcc extension)[http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Statement-Exprs.html#Statement-Exprs]). You need to call #enable_block_expressions on the parser first. They pop out as BlockExpression nodes.

    C.default_parser.enable_block_expressions node = C.parse 'char *brag(void) { return ({"I'm tricky!";}); }' node.entities[0].def.stmts.last.expr.class # => C::BlockExpression

== Open Issues

  • Is it okay to bastardize the =~ operator like that?
  • Should binary operators have a list of expressions rather than just 2? That'd allow to_s to format the strings better in some cases and make it consistent with Comma.
  • At the moment CAST chokes on preprocessor #-lines. Ruby makes it trivial to filter these out before passing the string to Parser#parse, and this makes it obvious when you forget to run a file through the preprocessor (which you need to do to get type names at least once). Is this stupid? Ideally we should probably have a builtin preprocessor and use that.

{Vote now}[mailto:george.ogata@gmail.com]

== To Do

  • Stop embarrasing yourself and write the parser in C.
  • Make it -wd friendly.
  • Fix the "TODO" bits in c.y. These are for rarely used C constructs, like the declaration int arr[*];.
  • Add a comment node. Might make it useful for doc extraction. Anyone want this? Not all comments will be caught, though. Comments can appear between any two tokens, and I don't really want to add comment fields to every node. They'd probably just appear between toplevel entities (in TranslationUnit#entities) and between statements (in Block#stmts).
  • Make it rdoc-able.

If any of these affect you greatly, {kick me}[mailto:george.ogata@gmail.com] to make it happen faster.

== Contact

I'm not really sure what people are going to try to use this for. If there's some functionality you think would make a good addition, or think I've made a mess of this poor puppy, give me a yell.

You can spam me at george.ogata@gmail.com. It'd help if you prefixed the subject with "[cast] " so I can easily distinguish CAST spam from fake Rolex spam.

== License

Ruby License

FAQs

Package last updated on 09 May 2012

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc