Security News
Input Validation Vulnerabilities Dominate MITRE's 2024 CWE Top 25 List
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
arrow-js-ffi
Advanced tools
Interpret Arrow memory across the WebAssembly boundary without serialization.
Arrow is a high-performance memory layout for analytical programs. Since Arrow's memory layout is defined to be the same in every implementation, programs that use Arrow in WebAssembly are using the same exact layout that Arrow JS implements! This means we can use plain ArrayBuffer
s to move highly structured data back and forth to WebAssembly memory, entirely avoiding serialization.
I wrote an interactive blog post that goes into more detail on why this is useful and how this library implements Arrow's C Data Interface in JavaScript.
This package exports two functions, parseField
for parsing the ArrowSchema
struct into an arrow.Field
and parseVector
for parsing the ArrowArray
struct into an arrow.Vector
.
parseField
Parse an ArrowSchema
C FFI struct into an arrow.Field
instance. The Field
is necessary for later using parseVector
below.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer in buffer
where the C struct is located.const WASM_MEMORY: WebAssembly.Memory = ...
const field = parseField(WASM_MEMORY.buffer, fieldPtr);
parseSchema
Parse an ArrowSchema
C FFI struct into an arrow.Schema
instance. Note that the underlying field must be a Struct
type. In essence a Struct
field is used to mimic a Schema
while only being one field.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer in buffer
where the C struct is located.const WASM_MEMORY: WebAssembly.Memory = ...
const schema = parseSchema(WASM_MEMORY.buffer, fieldPtr);
parseData
Parse an ArrowArray
C FFI struct into an arrow.Data
instance. Multiple Data
instances can be joined to make an arrow.Vector
.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer in buffer
where the C struct is located.dataType
(arrow.DataType
): The type of the vector to parse. This is retrieved from field.type
on the result of parseField
.copy
(boolean
, default: true
): If true
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false
, the resulting arrow.Data
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.const WASM_MEMORY: WebAssembly.Memory = ...
const copiedData = parseData(WASM_MEMORY.buffer, arrayPtr, field.type);
// Make zero-copy views instead of copying array contents
const viewedData = parseData(WASM_MEMORY.buffer, arrayPtr, field.type, false);
parseVector
Parse an ArrowArray
C FFI struct into an arrow.Vector
instance. Multiple Vector
instances can be joined to make an arrow.Table
.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.ptr
(number
): The numeric pointer in buffer
where the C struct is located.dataType
(arrow.DataType
): The type of the vector to parse. This is retrieved from field.type
on the result of parseField
.copy
(boolean
, default: true
): If true
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false
, the resulting arrow.Vector
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.const WASM_MEMORY: WebAssembly.Memory = ...
const copiedVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type);
// Make zero-copy views instead of copying array contents
const viewedVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type, false);
parseRecordBatch
Parse an ArrowArray
C FFI struct plus an ArrowSchema
C FFI struct into an arrow.RecordBatch
instance. Note that the underlying array and field must be a Struct
type. In essence a Struct
array is used to mimic a RecordBatch
while only being one array.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.arrayPtr
(number
): The numeric pointer in buffer
where the array C struct is located.schemaPtr
(number
): The numeric pointer in buffer
where the field C struct is located.copy
(boolean
, default: true
): If true
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false
, the resulting arrow.Vector
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.const WASM_MEMORY: WebAssembly.Memory = ...
const copiedRecordBatch = parseRecordBatch(
WASM_MEMORY.buffer,
arrayPtr,
fieldPtr
);
// Pass `false` to view arrays across the boundary instead of creating copies.
const viewedRecordBatch = parseRecordBatch(
WASM_MEMORY.buffer,
arrayPtr,
fieldPtr,
false
);
parseTable
Parse an Arrow Table object from WebAssembly memory to an Arrow JS Table
.
This expects an array of ArrowArray
C FFI structs plus an ArrowSchema
C FFI struct. Note that the underlying array and field pointers must be a Struct
type. In essence a Struct
array is used to mimic each RecordBatch
while only being one array.
buffer
(ArrayBuffer
): The WebAssembly.Memory
instance to read from.arrayPtrs
(number[]
): An array of numeric pointers describing the location in buffer
where the array C struct is located that represents each record batch.schemaPtr
(number
): The numeric pointer in buffer
where the field C struct is located.copy
(boolean
, default: true
): If true
, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false
, the resulting arrow.Vector
objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.const WASM_MEMORY: WebAssembly.Memory = ...
const table = parseTable(
WASM_MEMORY.buffer,
arrayPtrs,
schemaPtr,
true
);
TL;dr: As of version 0.4, arrow-js-ffi
does not release any WebAssembly resources. You must free Arrow resources when you're done with them to avoid memory leaks. This library does not currently provide helpers to deallocate that memory; instead look for free
methods exposed by Emscripten or wasm-bindgen.
Memory management between WebAssembly's own memory and JavaScript memory can be tricky. The Arrow C Data Interface includes prescriptions for memory management but those recommendations are designed for situations where two programs share the same memory space. Applying it to WebAssembly-JavaScript interop is imperfect because WebAssembly memory is sandboxed in a separate memory space.
The C Data Interface instructs consumers to call the release callback, which deallocates the referenced memory. However in our case, we can't call the release callback in all situations because the lifetime of views on the referenced Arrow data would outlive the lifetime of the data. Even when a user passes copy=true
, where the data is copied into JS memory, it's still uncertain whether to release the underlying resources because the user might want to still do something with their Wasm table data.
A future release of arrow-js-ffi
may include standalone functions to release Arrow data, which users can call manually once they know they're done with the data. But even in this case, freeing the underlying array will not free any wrapper structs allocated by Emscripten or wasm-bindgen. If the free method on those structs is called later, it would lead to a double-free.
If you have thoughts on memory management, open an issue!
Most of the unsupported types should be pretty straightforward to implement; they just need some testing.
List
.)[0.4.2] - 2024-04-18
Uint32Array
type in parseTable
.apache-arrow
is marked as an external library in RollupFAQs
Zero-copy reading of Arrow data from WebAssembly
We found that arrow-js-ffi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.
Research
Security News
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.