Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
= CAST
== Description
CAST parses C code into an abstract syntax tree (AST), lets you break it, then vomit it out as code. The parser does C99.
This fork supports Ruby 1.9.3, gemspec, and requires Hoe. The Rubyforge page above is documentation for the original version, but most things should be the same.
== Installation
gem install csquare-cast
== Library Overview
Everything is in the module C.
=== Usage
You call Parser#parse, and it gives you a tree of Node objects. Watch:
require 'cast/cast'
# create a parser
parser = C::Parser.new
# (optional) set some settings...
parser.pos.filename = "toy.c" # used for error messages
parser.type_names << 'LinkedList' # treat these words as types
# gimme a tree
ugly_c_code = open("toy.c"){|f| f.read}
tree = parser.parse(ugly_c_code)
# what's the tree look like?
p tree
If there's a parse error, #parse raises a ParseError (which has a nice error message in #message).
== The Parser
Here's a quiz: what does "a * b;" do?
I bet you said "why you l4m3r n00b, that's a statement that multiplies a by b and throws away the answer -- now go take your meaningless snippetage to your computing 101 class and let me finish hurting this JavaTM programmer." Well, you'd be both mean and wrong. It was, of course, a trick question. I didn't say if any of a and b are types! If only a is a type, it's actually a declaration. And if b is a type, it's a syntax error.
So, the parser's gonna need to know which identifiers are type names. This is one of the bits of state that a Parser keeps. Here's the complete list (um, of two):
A Node::Pos has three read-write atts: #filename, #line_num, #col_num. Default is nil, 1, 0.
== The Nodes
There are a couple of Node classes:
|
|
|
|
|
The bold ones are abstract.
The last 2 (the NodeLists) represent lists of nodes. Methodwise, they try to behave like normal ruby ::Arrays. Implementationwise, a NodeChain is a doubly linked list, whereas a NodeArray is an array. NodeChains may be more efficient when adding things at the beginning of a LARGE list.
=== Attributes
Each Node object has:
The #Foo? method is a convienience for a common need. Example:
## make a tree
ast = C::Parser.new.parse(code_string)
## print all global variables
ast.entities.each do |node|
node.Declaration? or next
node.declarators.each do |decl|
unless decl.type.Function?
puts "#{decl.name}: #{decl.type}"
end
end
end
If you're a duck-typing purist, then sorry for the cardiac arrest you're now experiencing. CAST does pay attention to the class of Node objects for quite a few things. This is the cleanest way to distinguish, e.g., an Add from a Subtract (which both have the same methods but represent very different things). It just seems impractical (and unnecessary) to allow duck typing in this situation.
The #=~ method lets you do:
if declarator.type =~ 'const int *'
puts "Ooh, a const int pointer!"
end
This is not the same as
declarator.type.to_s == 'const int *'
That'd require you to guess how to_s formats its strings (most notably, the whitespace).
=== Fields and children
Each concrete Node class has a member for each bit of important C stuff it pertains to. I know you peeked at the big list below, so you know the kind of thing I mean.
But these aren't defined as attrs as you normally do in Ruby -- they're fields. If a node has a field foo, it means there's a setter #foo= and getter #foo. (A field foo? means the setter is #foo= and the getter is #foo?.) Some fields are even more special: child fields. These form the tree structure, and always have a Node or nil value.
Why divulge these bizarre internal secrets? Because these Node methods are defined in terms of fields and children:
Then there's the tree-twiddling methods, which only ever yield/return/affect (non-nil) children.
If you're walking the tree looking for nodes to move around, don't forget that modifying the tree during traversal is a criminal offence.
And now, the episode you've been waiting for: THE FIELD LIST! (Cue music and dim lights.)
Notes about the table:
Class | Field | Child? | Type or possible values | Default | Comments | ||||
TranslationUnit | entities | Y | NodeList | NodeChain[] | He lives at the root of a parsed file. | ||||
Declaration | storage | :typedef, :extern, :static, :auto, :register |
There are also methods to query the storage more humanely:
| ||||||
type | Y | DirectType | |||||||
declarators | Y | NodeList | NodeArray[] | ||||||
inline? | true, false | ||||||||
Declarator | indirect_type | Y | IndirectType |
What on earth is a "declarator", you ask? Consider "int i,
*ip;". This is a Declaration with two
Declarators:
Declaration type: Int declarators: - Declarator name: "i" - Declarator indirect_type: Pointer name: "ip"The indirect_type of the ip Declarator is a Pointer to nil. "'Pointer to nil' my foot -- I want the type of the stupid variable!" Here:
Pointer type: Int | |||||
name | String | ||||||||
init | Y | Expression | |||||||
num_bits | Y | Integer | |||||||
FunctionDef | storage | :extern, :static |
Just like Declaration, there's also:
| ||||||
inline? | true, false | ||||||||
type | Y | Type | |||||||
name | String | ||||||||
def | Y | Block | Block.new | ||||||
no_prototype? | true, false | ||||||||
Parameter | register? | true, false | Used in Functions. | ||||||
type | Y | Type | |||||||
name | String | ||||||||
Enumerator | name | String | Used in Enums. | ||||||
val | Y | Expression | |||||||
MemberInit | member | Y | NodeList of Member-or-Expression | Used in CompoundLiterals. | |||||
init | Y | Expression | |||||||
Member | name | String | Used in MemberInits. | ||||||
Block | labels | Y | NodeList of Label | NodeArray[] | |||||
stmts | Y | NodeList of Statement | NodeArray[] | ||||||
If | labels | Y | NodeList of Label | NodeArray[] | |||||
cond | Y | Expression | |||||||
then | Y | Statement | |||||||
else | Y | Statement | |||||||
Switch | labels | Y | NodeList of Label | NodeArray[] | |||||
cond | Y | Expression | |||||||
stmt | Y | Statement | |||||||
While | labels | Y | NodeList of Label | NodeArray[] | do? means it's a do-while loop. | ||||
do? | true, false | ||||||||
cond | Y | Expression | |||||||
stmt | Y | Statement | |||||||
For | labels | Y | NodeList of Label | NodeArray[] | |||||
init | Y | Expression or Declaration | |||||||
cond | Y | Expression | |||||||
iter | Y | Expression | |||||||
stmt | Y | Statement | |||||||
Goto | labels | Y | NodeList of Label | NodeArray[] | |||||
target | String | ||||||||
Continue | labels | Y | NodeList of Label | NodeArray[] | |||||
Break | labels | Y | NodeList of Label | NodeArray[] | |||||
Return | labels | Y | NodeList of Label | NodeArray[] | |||||
expr | Y | Expression | |||||||
ExpressionStatement | labels | Y | NodeList of Label | NodeArray[] | |||||
expr | Y | Expression | |||||||
PlainLabel | name | String | |||||||
Default | |||||||||
Case | expr | Y | Expression | ||||||
Comma | exprs | Y | NodeList of Expression | ||||||
Conditional | cond | Y | Expression | ||||||
then | Y | Expression | |||||||
else | Y | Expression | |||||||
Variable | name | String | |||||||
Index | expr | Y | Expression | ||||||
index | Y | Expression | |||||||
Call | expr | Y | Expression | ||||||
args | Y | NodeList of Expression-or-Type | |||||||
Dot | expr | Y | Expression | ||||||
member | Y | String | |||||||
Arrow | expr | Y | Expression | ||||||
member | Y | String | |||||||
PostInc | expr | Y | Expression | ||||||
PostDec | expr | Y | Expression | ||||||
Cast | type | Y | Type | ||||||
expr | Y | Expression | |||||||
Address | expr | Y | Expression | ||||||
Dereference | expr | Y | Expression | ||||||
Sizeof | expr | Y | Type or Expression | ||||||
Positive | expr | Y | Expression | ||||||
Negative | expr | Y | Expression | ||||||
PreInc | expr | Y | Expression | ||||||
PreDec | expr | Y | Expression | ||||||
BitNot | expr | Y | Expression | ||||||
Not | expr | Y | Expression | ||||||
Add | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Subtract | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Multiply | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Divide | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Mod | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Equal | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
NotEqual | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Less | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
More | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
LessOrEqual | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
MoreOrEqual | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
BitAnd | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
BitOr | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
BitXor | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
ShiftLeft | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
ShiftRight | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
And | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Or | expr1 | Y | Expression | ||||||
expr2 | Y | Expression | |||||||
Assign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
MultiplyAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
DivideAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
ModAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
AddAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
SubtractAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
ShiftLeftAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
ShiftRightAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
BitAndAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
BitXorAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
BitOrAssign | lval | Y | Expression | ||||||
rval | Y | Expression | |||||||
StringLiteral | val | String | The String in val is the literal string entered. "\n" isn't converted to a newline, for instance. | ||||||
CharLiteral | val | String | The String in val is the literal string entered. '\n' isn't converted to a newline, for instance. | ||||||
CompoundLiteral | type | Y | Type |
Here's an example. (struct S){1, .x = 2, .y [3] .z = 4} parses as: CompoundLiteral type: Struct name: "S" member_inits: - MemberInit init: IntLiteral val: 1 - MemberInit member: - Member name: "x" init: IntLiteral val: 2 - MemberInit member: - Member name: "y" - IntLiteral val: 3 - Member name: "z" init: IntLiteral val: 4"That's legal syntax!?" Yep. Look it up. | |||||
member_inits | Y | NodeList of MemberInit | NodeArray[] | ||||||
IntLiteral | val | Integer |
Also:
| ||||||
format | :dec, :hex, :oct | :dec | |||||||
FloatLiteral | val | Float | |||||||
Pointer | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
type | Y | Type | |||||||
Array | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
type | Y | Type | |||||||
length | Y | Expression | |||||||
Function | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
type | Y | Type | |||||||
params | Y | NodeList of Parameter | NodeArray[] | ||||||
var_args? | true, false | ||||||||
Struct | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
name | String | ||||||||
members | Y | NodeList of Member | NodeArray[] | ||||||
Union | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
name | String | ||||||||
members | Y | NodeList of Member | NodeArray[] | ||||||
Enum | const? | true, false | |||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
name | String | ||||||||
members | Y | NodeList of Enumerator | |||||||
CustomType | const? | true, false | This is for typedef'd names. | ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
name | String | ||||||||
Void | const? | true, false | const void!? Yes, think about: const void *. | ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
Int | const? | true, false |
longness sounds silly, so here are some less silly
methods:
| ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
longness | -1, 0, 1, 2 | 0 | |||||||
unsigned? | true, false | ||||||||
Float | const? | true, false |
Less silly-sounding longness substitutes:
| ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
longness | 0, 1, 2 | 0 | |||||||
Char | const? | true, false |
Also:
| ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
signed | true, false, nil | ||||||||
Bool | const? | true, false | This is the rarely seen _Bool type. | ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
Complex | const? | true, false |
This is the rarely seen _Complex type.
| ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
longness | 0, 1, 2 | 0 | |||||||
Imaginary | const? | true, false |
This is the rarely seen _Imaginary type.
| ||||||
restrict? | true, false | ||||||||
volatile? | true, false | ||||||||
longness | 0, 1, 2 | 0 | |||||||
BlockExpression | block | Y | Block | Block.new | Only if the block_expressions extension is enabled. See "Extensions" section below. |
=== Node Construction
Wanna make a Node? Take your pick:
They're for losers, though. What you really want to do is make Nodes by parsing C code. Each class -- even the abstract classes like Statement -- has a .parse method:
function_def = C::FunctionDef.parse <<EOS
void frobnicate(int karma) {
use_waffle_iron();
}
stmt = C::Statement.parse('while (not_looking) paint_car();')
Need to tell it to treat WaffleIron as a type name? All those parse methods use C.default_parser:
C.default_parser.type_names << 'WaffleIron'
type = C::Type.parse('WaffleIron')
Alternatively, you could've given parse your own parser:
parser = C::Parser.new
parser.type_names << 'WaffleIron'
type = C::Type.parse('WaffleIron', parser)
In fact, there's also C.parse(str, parser=nil), which is an alias for C::TranslationUnit.parse(str, parser).
ast = C.parse(STDIN)
Yes, all that talk in the intro about doing parser = C::Parser.new; parser.parse(...) was actually all a charade to make you type more. I so own you.
== Extensions
CAST has developed extensions! To the C99 grammar, I mean.
Types are allowed as function arguments. This lets you parse macros like va_arg().
Blocks in parentheses are allowed as expressions ((a gcc extension)[http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/Statement-Exprs.html#Statement-Exprs]). You need to call #enable_block_expressions on the parser first. They pop out as BlockExpression nodes.
C.default_parser.enable_block_expressions node = C.parse 'char *brag(void) { return ({"I'm tricky!";}); }' node.entities[0].def.stmts.last.expr.class # => C::BlockExpression
== Open Issues
{Vote now}[mailto:george.ogata@gmail.com]
== To Do
If any of these affect you greatly, {kick me}[mailto:george.ogata@gmail.com] to make it happen faster.
== Contact
I'm not really sure what people are going to try to use this for. If there's some functionality you think would make a good addition, or think I've made a mess of this poor puppy, give me a yell.
You can spam me at george.ogata@gmail.com. It'd help if you prefixed the subject with "[cast] " so I can easily distinguish CAST spam from fake Rolex spam.
== License
Ruby License
FAQs
Unknown package
We found that csquare-cast demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.