Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

apache-arrow

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

apache-arrow

Apache Arrow columnar in-memory format

4.0.0
Source
npm

Version published: 4 years ago

Weekly downloads: 211K; increased by12.13%

Maintainers: 5

Weekly downloads

Created: 7 years ago

What is apache-arrow?

The apache-arrow npm package provides a cross-language development platform for in-memory data. It is designed to improve the performance and efficiency of data processing and analytics by using a columnar memory format. This package is particularly useful for handling large datasets and performing complex data manipulations.

What are apache-arrow's main functionalities?

Reading and Writing Arrow Files

This feature allows you to read and write Arrow files, which are efficient for storing and transferring large datasets. The code sample demonstrates how to read an Arrow file into a table and how to write a new table to an Arrow file.

const arrow = require('apache-arrow');
const fs = require('fs');

// Reading an Arrow file
const arrowFile = fs.readFileSync('data.arrow');
const table = arrow.Table.from([arrowFile]);
console.log(table.toString());

// Writing an Arrow file
const newTable = arrow.Table.new([{ name: 'Alice', age: 30 }, { name: 'Bob', age: 25 }]);
const arrowBuffer = newTable.serialize();
fs.writeFileSync('newData.arrow', arrowBuffer);

DataFrame Operations

This feature provides DataFrame-like operations, such as creating tables, selecting columns, and filtering rows. The code sample shows how to create a DataFrame, select a column, and filter rows based on a condition.

const arrow = require('apache-arrow');

// Creating a DataFrame
const df = new arrow.Table({
  name: arrow.Utf8Vector.from(['Alice', 'Bob']),
  age: arrow.Int32Vector.from([30, 25])
});

// Selecting a column
const names = df.getColumn('name');
console.log(names.toArray());

// Filtering rows
const filtered = df.filter(row => row.get('age') > 25);
console.log(filtered.toString());

Interoperability with Other Languages

Apache Arrow supports interoperability with other languages like Python, R, and Java. The code sample demonstrates how to create a table in JavaScript and pass it to Python using the pyarrow library.

const arrow = require('apache-arrow');
const pyarrow = require('pyarrow'); // Assuming you have a Python environment set up

// Create a table in JavaScript
const table = arrow.Table.new([{ name: 'Alice', age: 30 }, { name: 'Bob', age: 25 }]);
const arrowBuffer = table.serialize();

// Pass the buffer to Python
const pyTable = pyarrow.Table.from_buffer(arrowBuffer);
print(pyTable)

Other packages similar to apache-arrow

Apache Arrow in JS

Arrow is a set of technologies that enable big data systems to process and transfer data quickly.

Install `apache-arrow` from NPM

npm install apache-arrow or yarn add apache-arrow

(read about how we package apache-arrow below)

Powering Columnar In-Memory Analytics

Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. The Arrow spec aligns columnar data in memory to minimize cache misses and take advantage of the latest SIMD (Single input multiple data) and GPU operations on modern processors.

Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...). By standardizing on a common binary interchange format, big data systems can reduce the costs and friction associated with cross-system communication.

Get Started

Check out our API documentation to learn more about how to use Apache Arrow's JS implementation. You can also learn by example by checking out some of the following resources:

Observable: Introduction to Apache Arrow
Observable: Manipulating flat arrays arrow-style
Observable: Rich columnar data tables - Dictionary-encoded strings, 64bit ints, and nested structs
/js/test/unit - Unit tests for Table and Vector

Cookbook

Get a table from an Arrow file on disk (in IPC format)

import { readFileSync } from 'fs';
import { Table } from 'apache-arrow';

const arrow = readFileSync('simple.arrow');
const table = Table.from([arrow]);

console.log(table.toString());

/*
 foo,  bar,  baz
   1,    1,   aa
null, null, null
   3, null, null
   4,    4,  bbb
   5,    5, cccc
*/

Create a Table when the Arrow file is split across buffers

import { readFileSync } from 'fs';
import { Table } from 'apache-arrow';

const table = Table.from([
    'latlong/schema.arrow',
    'latlong/records.arrow'
].map((file) => readFileSync(file)));

console.log(table.toString());

/*
        origin_lat,         origin_lon
35.393089294433594,  -97.6007308959961
35.393089294433594,  -97.6007308959961
35.393089294433594,  -97.6007308959961
29.533695220947266, -98.46977996826172
29.533695220947266, -98.46977996826172
*/

Create a Table from JavaScript arrays

import {
  Table,
  FloatVector,
  DateVector
} from 'apache-arrow';

const LENGTH = 2000;

const rainAmounts = Float32Array.from(
  { length: LENGTH },
  () => Number((Math.random() * 20).toFixed(1)));

const rainDates = Array.from(
  { length: LENGTH },
  (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));

const rainfall = Table.new(
  [FloatVector.from(rainAmounts), DateVector.from(rainDates)],
  ['precipitation', 'date']
);

Load data with `fetch`

import { Table } from "apache-arrow";

const table = await Table.from(fetch("/simple.arrow"));
console.log(table.toString());

Columns look like JS Arrays

import { readFileSync } from 'fs';
import { Table } from 'apache-arrow';

const table = Table.from([
    'latlong/schema.arrow',
    'latlong/records.arrow'
].map(readFileSync));

const column = table.getColumn('origin_lat');

// Copy the data into a TypedArray
const typed = column.toArray();
assert(typed instanceof Float32Array);

for (let i = -1, n = column.length; ++i < n;) {
    assert(column.get(i) === typed[i]);
}

Usage with MapD Core

import MapD from 'rxjs-mapd';
import { Table } from 'apache-arrow';

const port = 9091;
const host = `localhost`;
const db = `mapd`;
const user = `mapd`;
const password = `HyperInteractive`;

MapD.open(host, port)
  .connect(db, user, password)
  .flatMap((session) =>
    // queryDF returns Arrow buffers
    session.queryDF(`
      SELECT origin_city
      FROM flights
      WHERE dest_city ILIKE 'dallas'
      LIMIT 5`
    ).disconnect()
  )
  .map(([schema, records]) =>
    // Create Arrow Table from results
    Table.from([schema, records]))
  .map((table) =>
    // Stringify the table to CSV with row numbers
    table.toString({ index: true }))
  .subscribe((csvStr) =>
    console.log(csvStr));
/*
Index,   origin_city
    0, Oklahoma City
    1, Oklahoma City
    2, Oklahoma City
    3,   San Antonio
    4,   San Antonio
*/

Getting involved

See DEVELOP.md

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:

Join the mailing list: send an email to dev-subscribe@arrow.apache.org. Share your ideas and use cases for the project.
Follow our activity on JIRA
Learn the format
Contribute code to one of the reference implementations

We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/arrow repository.

If you are looking for some ideas on what to contribute, check out the JIRA issues for the Apache Arrow project. Comment on the issue and/or contact dev@arrow.apache.org with your questions and ideas.

If you’d like to report a bug but don’t have time to fix it, you can still post it on JIRA, or email the mailing list dev@arrow.apache.org

Packaging

apache-arrow is written in TypeScript, but the project is compiled to multiple JS versions and common module formats.

The base apache-arrow package includes all the compilation targets for convenience, but if you're conscientious about your node_modules footprint, we got you.

The targets are also published under the @apache-arrow namespace:

npm install apache-arrow # <-- combined es2015/UMD + esnext/CommonJS/ESModules/UMD
npm install @apache-arrow/ts # standalone TypeScript package
npm install @apache-arrow/es5-cjs # standalone es5/CommonJS package
npm install @apache-arrow/es5-esm # standalone es5/ESModules package
npm install @apache-arrow/es5-umd # standalone es5/UMD package
npm install @apache-arrow/es2015-cjs # standalone es2015/CommonJS package
npm install @apache-arrow/es2015-esm # standalone es2015/ESModules package
npm install @apache-arrow/es2015-umd # standalone es2015/UMD package
npm install @apache-arrow/esnext-cjs # standalone esNext/CommonJS package
npm install @apache-arrow/esnext-esm # standalone esNext/ESModules package
npm install @apache-arrow/esnext-umd # standalone esNext/UMD package

Why we package like this

The JS community is a diverse group with a varied list of target environments and tool chains. Publishing multiple packages accommodates projects of all stripes.

If you think we missed a compilation target and it's a blocker for adoption, please open an issue.

People

Full list of broader Apache Arrow committers.

Brian Hulette, committer
Paul Taylor, Graphistry, Inc., committer

Powered By Apache Arrow in JS

Full list of broader Apache Arrow projects & organizations.

Open Source Projects

Apache Arrow -- Parent project for Powering Columnar In-Memory Analytics, including affiliated open source projects
rxjs-mapd -- A MapD Core node-driver that returns query results as Arrow columns
Perspective -- Perspective is a streaming data visualization engine by J.P. Morgan for JavaScript for building real-time & user-configurable analytics entirely in the browser.
Falcon is a visualization tool for linked interactions across multiple aggregate visualizations of millions or billions of records.

Companies & Organizations

CCRi -- Commonwealth Computer Research Inc, or CCRi, is a Central Virginia based data science and software engineering company
GOAI -- GPU Open Analytics Initiative standardizes on Arrow as part of creating common data frameworks that enable developers and statistical researchers to accelerate data science on GPUs
Graphistry, Inc. - An end-to-end GPU accelerated visual investigation platform used by teams for security, anti-fraud, and related investigations. Graphistry uses Arrow in its NodeJS GPU backend and client libraries, and is an early contributing member to GOAI and Arrow[JS] working to bring these technologies to the enterprise.

License

Apache 2.0

Apache Arrow 4.0.0 (2021-04-26)

Bug Fixes

ARROW-4784 - [C++][CI] Re-enable flaky mingw tests.
ARROW-6818 - [DOC] Remove reference to Apache Drill design docs
ARROW-7288 - [C++][Parquet] Don't use regular expression to parse application version
ARROW-7830 - [C++][Parquet] Use Arrow version number for parquet
ARROW-9451 - [Python] Refuse implicit cast of str to unsigned integer
ARROW-9634 - [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow
ARROW-9878 - [Python] Document caveats of to_pandas(self_destruct=True)
ARROW-10038 - [C++] Spawn thread pool threads lazily
ARROW-10056 - [C++] Increase flatbuffers max_tables parameter in order to read wide tables
ARROW-10364 - [Dev][Archery] Add support for semver 2.13.0
ARROW-10370 - [Python] Clean-up filesystem handling in write_dataset
ARROW-10403 - [C++] Implement unique kernel for non-uniform chunked dictionary arrays
ARROW-10405 - [C++] IsIn kernel should be able to lookup dictionary in string
ARROW-10457 - [CI] Fix Spark integration tests with branch-3.0
ARROW-10489 - [C++] Add Intel C++ compiler options for different warning levels
ARROW-10514 - [C++][Parquet] Make the column name the same for both output formats of parquet reader
ARROW-10953 - [R] Validate when creating Table with schema
ARROW-11066 - [FlightRPC][Java] Make zero-copy writes a configurable option
ARROW-11066 - [FlightRPC][Java] Revert "fix zero-copy optimization"
ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
ARROW-11066 - Revert "ARROW-11066: [Java][FlightRPC] fix zero-copy opt…
ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
ARROW-11134 - [CI][C++] Always run tests on Travis-CI
ARROW-11147 - [CI][Python] Remove pandas=0.25.3 pin for dask-latest
ARROW-11180 - [Developer] cmake-format pre-commit hook doesn't run
ARROW-11192 - [Documentation] Describe opening Visual Studio so it inherits a working env
ARROW-11223 - [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflow
ARROW-11235 - [Python] Fix test failure inside non-default S3 region
ARROW-11239 - [Rust] Fixed equality with offsets and nulls
ARROW-11269 - [Rust][Parquet] Preserve timezone in int96 reader
ARROW-11277 - [C++] Workaround macOS 10.11: don't default construct consts
ARROW-11299 - [Python] Fix invalid-offsetof warnings
ARROW-11303 - [Release][C++] Enable mimalloc in the windows verification script
ARROW-11305 - Skip first argument (which is the program name) in parquet-rowcount binary
ARROW-11311 - [Rust] Fixed unset_bit
ARROW-11313 - [Rust] Fixed size_hint
ARROW-11315 - [Packaging][APT][arm64] Add missing gir1.2 files
ARROW-11320 - [C++] Try to strengthen temporary dir creation
ARROW-11322 - [Rust] Re-opening memory module as public
ARROW-11323 - [Rust][DataFusion] Allow sort queries to return no results
ARROW-11328 - [R] Collecting zero columns from a dataset returns entire dataset
ARROW-11334 - [Python][CI] Fix failing pandas nightly tests
ARROW-11337 - [C++] Compilation error with ThreadSanitizer
ARROW-11357 - [Rust] : Fix out-of-bounds reads in take and other undefined behavior
ARROW-11376 - [C++] ThreadedTaskGroup failure with Thread Sanitizer enabled
ARROW-11379 - [C++][Dataset] Better formatting for timestamp scalars
ARROW-11387 - [Rust] fix build for conditional compilation of features 'simd + avx512'
ARROW-11391 - [C++] Allow writing more than 2 GB to HDFS
ARROW-11394 - [Rust] Tests for Slice & Concat
ARROW-11400 - [Python] Ensure pickling Dataset with dictionary partitions works
ARROW-11403 - [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'
ARROW-11412 - [Python][Dataset] Disallow logical operators for Expression
ARROW-11412 - [Python] Improve Expression docs
ARROW-11427 - [C++] On Windows, only use AVX512 when enabled by the OS
ARROW-11448 - [C++] Fix tdigest build failure on Windows with Visual Studio
ARROW-11451 - [C++] Fix gcc-4.8 build errors
ARROW-11452 - [Rust] Fix issue with Parquet Arrow reader not following type path
ARROW-11461 - [Go][Flight] Some cleanup for flight, Fix Schema bytes
ARROW-11464 - [Python] Fix parquet.read_pandas to support all keywords of read_table
ARROW-11470 - [C++] Detect overflow on computation of tensor strides
ARROW-11472 - [Python][CI] Remove temporary pin of numpy in kartothek integration build
ARROW-11472 - [Python][CI] Temporary pin numpy on kartothek integration builds
ARROW-11480 - [Python] Test filtering on INT96 timestamps
ARROW-11483 - [C++] Write integration JSON files compatible with the Java reader
ARROW-11488 - [Rust] Don't leak memory in StructBuilder
ARROW-11490 - [C++] BM_ArrowBinaryDict/EncodeLowLevel is not deterministic
ARROW-11494 - [Rust] fix take bench
ARROW-11497 - [Python] Provide parquet enable compliant nested type flag for python binding
ARROW-11538 - [Python] Segfault reading Parquet dataset with Timestamp filter
ARROW-11547 - [Packaging][Conda][Drone] Fix undefined variable error
ARROW-11548 - [C++] Fix RandomArrayGenerator::List
ARROW-11551 - [C++][Gandiva] Fix castTimestamp(utf8) function
ARROW-11560 - [C++][FlightRPC] fix mutex error on SIGINT
ARROW-11567 - [C++][Compute] Improve variance kernel precision
ARROW-11577 - [Rust] Fix Array transform on strings
ARROW-11582 - [R] write_dataset 'format' argument default and validation could be better
ARROW-11586 - [Rust][Datafusion] Remove force unwrap
ARROW-11595 - [C++][NIGHTLY:test-conda-cpp-valgrind] Avoid branching on potentially indeterminate values in GenerateBitsUnrolled
ARROW-11596 - [Python][Dataset] make ScanTask.execute() eager
ARROW-11603 - [Rust] Fix Clippy Lints for Rust 1.50
ARROW-11607 - [C++][Parquet] Update values_capacity_ when resetting.
ARROW-11614 - Fix round() logic to return positive zero when argument is zero
ARROW-11617 - [C++][Gandiva] Fix nested if-else optimisation in gandiva
ARROW-11620 - [Rust][DataFusion] Consistently use Arc<dyn TableProvider> rather than Box and Arc
ARROW-11630 - [Rust] Introduce limit option for sort kernel
ARROW-11632 - [Rust] Make csv::Reader propagate schema metadata to generated RecordBatches
ARROW-11639 - [C++][Gandiva] Fix signbit compilation issue in Ubuntu nightly build
ARROW-11642 - [C++] Fix preprocessor directive for Windows in JVM detection
ARROW-11657 - [R] group_by with .drop specified errors
ARROW-11658 - [R] Handle mutate/rename inside group_by
ARROW-11663 - [Rust][DataFusion] Fixed error.
ARROW-11668 - [C++] Sporadic UBSAN error in FutureStessTest.TryAddCallback
ARROW-11672 - [R] Fix string function test failure on R 3.3
ARROW-11681 - [Rust] Don't unwrap in IPC writers
ARROW-11686 - [C++] Call ArrowLog::InstallFailureSignalHandler to show stack trace
ARROW-11687 - [Rust][DataFusion] RepartitionExec Hanging
ARROW-11694 - [C++] Fix Take() with no validity bitmap but unknown null count
ARROW-11695 - [C++][FlightRPC] fix option to disable TLS verification
ARROW-11717 - [Integration] Fix intermittent flight integration failures with rust
ARROW-11718 - [Rust] Don't write IPC footers on drop
ARROW-11741 - [C++] Fix decimal casts on big endian platforms
ARROW-11743 - [R] Use pkgdown's new found ability to autolink Jiras
ARROW-11746 - [Developer][Archery] Fix prefer real time check
ARROW-11756 - [R] passing a partition as a schema leads to segfaults
ARROW-11758 - [C++][Compute] Improve summation kernel percision
ARROW-11767 - [C++] Scalar::Hash may segfault
ARROW-11771 - [Developer][Archery] Move benchmark tests (so CI runs them)
ARROW-11781 - [Python] Reading small amount of files from a partitioned dataset is unexpectedly slow
ARROW-11784 - [Rust][DataFusion] CoalesceBatchesStream doesn't honor Stream interface
ARROW-11785 - [R] Fallback when filtering Table with unsupported expression fails
ARROW-11786 - [C++] Remove noisy CMake message
ARROW-11788 - [Java] Fix appending empty delta vectors
ARROW-11791 - [Rust][DataFusion] Fix RepartitionExec Blocking
ARROW-11802 - [Rust][DataFusion] Remove use of crossbeam channels to avoid potential deadlocks
ARROW-11819 - [Rust] Add link to the doc
ARROW-11821 - [Rust] Edit Rust README
ARROW-11830 - [C++] Don't re-detect gRPC every time
ARROW-11832 - [R] Handle conversion of extra nested struct column
ARROW-11836 - [C++] Avoid requiring arrow_bundled_dependencies when it doesn't exist for arrow_static.
ARROW-11845 - [Rust] Implement to_isize() for ArrowNativeTypes
ARROW-11850 - [GLib] Add GARROW_VERSION_0_16
ARROW-11855 - [C++][Python] Memory leak in to_pandas when converting chunked struct array
ARROW-11857 - [Python] Resource temporarily unavailable when using the new Dataset API with Pandas
ARROW-11860 - [Rust][DataFusion] Add DataFusion logos
ARROW-11866 - [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
ARROW-11872 - [C++] Fix Array validation when Array contains non-CPU buffers
ARROW-11880 - [R] Handle empty or NULL transmute() args properly
ARROW-11881 - [Rust][DataFusion] Fix clippy lint
ARROW-11896 - [Rust] Disable Debug symbols on CI test builds
ARROW-11904 - [C++] Try to fix crash on test tear down
ARROW-11905 - [C++] Fix SIMD detection on macOS
ARROW-11914 - [R][CI] r-sanitizer nightly is broken
ARROW-11918 - [R][Documentation] Docs cleanups
ARROW-11923 - [CI] Update branch name for dask dev integration tests
ARROW-11937 - [C++] Fix GZip codec hanging if flushed twice
ARROW-11941 - [Dev] Don't update Jira if run "DEBUG=1 merge_arrow_pr.py"
ARROW-11942 - [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads
ARROW-11945 - [R] filter doesn't accept negative numbers as valid
ARROW-11956 - [C++] Fix system re2 dependency detection for static library
ARROW-11965 - [R][Docs] Simplify install.packages command in R dev docs
ARROW-11970 - [C++][CI] Fix Valgrind error in arrow-csv-test
ARROW-11971 - [Packaging] Vcpkg patch doesn't apply on windows due to line endings
ARROW-11975 - [CI][GLib] Remove needless libgccjit
ARROW-11976 - [C++] Fix sporadic TSAN error with GatingTask
ARROW-11983 - [Python] Avoid ImportError calling from_pandas in threaded code
ARROW-11997 - [Python] concat_tables crashes python interpreter
ARROW-12003 - [R] Fix NOTE re undefined global function group_by_drop_default
ARROW-12006 - [Java] Fix checkstyle config to work on Windows
ARROW-12012 - [Java][JDBC] Fix BinaryConsumer reallocation
ARROW-12013 - [C++][FlightRPC] Fix bundled gRPC version probing
ARROW-12015 - [Rust][DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
ARROW-12028 - ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates
ARROW-12029 - [R] Remove args from FeatherReader$create v2
ARROW-12033 - [Minor][Docs] Fix link in developers/benchmarks.html
ARROW-12041 - [C++][Python] Fix type property of tensor and sparse tensor IPC messages
ARROW-12051 - [GLib] Keep input stream reference of GArrowCSVReader
ARROW-12057 - [Python] Remove direct usage of pandas' Block subclasses (partly)
ARROW-12065 - [C++][Python] Fix segfault reading JSON file
ARROW-12067 - [Python][Doc] Document pyarrow_(un)wrap_scalar
ARROW-12073 - [R] Fix R CMD check NOTE about ‘X_____X’
ARROW-12076 - [Rust] Fix build
ARROW-12077 - [C++] Fix out-of-bounds write in ListArray::FromArrays
ARROW-12086 - [C++] Fix environment variables for bzip2, utf8proc URLs
ARROW-12088 - [Python] Fix compiler warning about offsetof
ARROW-12089 - [Doc] Fix Sphinx warnings
ARROW-12100 - [C++][IPC] Allow null children field when num children is 0
ARROW-12103 - [C++] Correctly handle unaligned access in bit-unpacking code
ARROW-12112 - [CI] Reduce footprint of conda-integration image
ARROW-12112 - [Rust] Create and store less debug information in CI and integration tests
ARROW-12113 - [R] Fix rlang deprecation warning from check_select_helpers()
ARROW-12130 - [C++] Don't enable Neon if -DARROW_SIMD_LEVEL=NONE
ARROW-12138 - [Go][IPC] Update flatbuffers definitions
ARROW-12140 - [C++][CI] Fix Valgrind failures in Grouper tests
ARROW-12145 - [Developer][Archery] Flaky: test_static_runner_from_json
ARROW-12149 - [Dev] Archery benchmark test case is failing
ARROW-12154 - [C++][Gandiva] Fix gandiva crash in certain OS/CPU combinations
ARROW-12155 - [R] Require Table columns to be same length
ARROW-12161 - [C++][Dataset] Revert async CSV reader in datasets
ARROW-12161 - [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets
ARROW-12169 - [C++] Fix decompressing file with empty stream at the end
ARROW-12171 - [Rust] clean up clippy lints
ARROW-12172 - [Python][Packaging] Pass python version as setuptools pretend version in the macOS wheel builds
ARROW-12178 - [CI] Update setuptools in the ubuntu images
ARROW-12186 - [Rust][DataFusion] Fix regexp_match test
ARROW-12209 - [JS] Copy all src files into the TypeScript package
ARROW-12220 - [C++][CI] Thread sanitizer failure
ARROW-12226 - [C++] Fix Address Sanitizer failures
ARROW-12227 - [R] Fix RE2 and median nightly build failures
ARROW-12235 - [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions
ARROW-12241 - [Python] Make CSV cancellation test more robust
ARROW-12250 - [Rust][Parquet] Fix failing arrow_writer test
ARROW-12254 - [Rust][DataFusion] Stop polling limit input once limit is reached
ARROW-12258 - [R] Never do as.data.frame() on collect(as_data_frame = FALSE)
ARROW-12262 - [Doc] Enable S3 and Flight in docs build
ARROW-12267 - [Rust] Implement support for timestamps in JSON writer
ARROW-12273 - [JS][Rust] Remove coveralls
ARROW-12279 - [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)
ARROW-12294 - [Rust] Fix boolean kleene kernels with no remainder
ARROW-12299 - [Python] Recognize new filesytems in pq.write_to_dataset
ARROW-12300 - [C++] Remove linking of cuda runtime library
ARROW-12313 - [Rust][Ballista] Update benchmark docs for Ballista
ARROW-12314 - [Python] Accept columns as set in parquet read_pandas
ARROW-12327 - [Dev] Use pull request's head remote when submitting crossbow jobs via the comment bot
ARROW-12330 - [Developer] Restore values at counters column of Archery benchmark
ARROW-12334 - [Rust][Ballista] Aggregate queries producing incorrect results
ARROW-12342 - [Packaging] Fix tabulation in crossbow templates for submitting nightly builds
ARROW-12357 - [Archery] Bump Jinja2 version requirement
ARROW-12379 - [C++] Fix ThreadSanitizer failure in SerialExecutor
ARROW-12382 - [C++] Bundle xsimd if runtime SIMD level is set
ARROW-12385 - [R][CI] fix cran picking in CI
ARROW-12390 - [Rust] Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice
ARROW-12401 - [R] Fix guard around dataset___Scanner__TakeRows
ARROW-12405 - [Packaging] Fix apt artifact patterns and artifact uploading from travis
ARROW-12408 - [R] Delete Scan()
ARROW-12421 - [Rust][DataFusion] Fix topkexec failure
ARROW-12421 - [Rust][DataFusion] Disable repartition rule
ARROW-12429 - [C++] Fix incorrectly registered test
ARROW-12433 - [Rust] Update nightly rust version
ARROW-12437 - [Rust][Ballista] Create DataFusion context without repartition
ARROW-12440 - [Release][Packaging] Various packaging, release script and release verification script fixes
ARROW-12466 - [Python] Avoid AttributeError crash when comparing with None
ARROW-12475 - [C++] Fix 'warn_unused_result' warning
ARROW-12487 - [C++][Dataset] Fix ScanBatches() hanging
ARROW-12495 - [C++] Fix NumPyBuffer::mutable_data()
ARROW-12794 - C++/R: read_parquet halts process when accessed multiple times
PARQUET-1655 - [C++] Fix comparison of Decimal values in statistics
PARQUET-2008 - [C++] Fix information written in RowGroup::total_byte_size

New Features and Improvements

ARROW-951 - [JS] Upgrade to typedoc 0.20.19
ARROW-2229 - [C++][Python] Add WriteCsv functionality.
ARROW-3690 - [Rust] Add Rust to the format integration testing
ARROW-6103 - [Release][Java] Remove mvn release plugin
ARROW-6248 - [Python][C++] Raise better exception on HDFS file open error
ARROW-6455 - [C++] Implement ExtensionType for non-UTF8 Unicode data
ARROW-6604 - [C++] Add support for nested types to MakeArrayFromScalar
ARROW-7215 - [C++][Gandiva] Implement castVARCHAR(numeric_type) functions
ARROW-7364 - [Rust][DataFusion] Add cast options to cast kernel and TRY_CAST to DataFusion
ARROW-7633 - [C++][CI] Create fuzz targets for tensors and sparse tensors
ARROW-7808 - [Java][Dataset] Implement Dataset Java API by JNI to C++
ARROW-7906 - [C++][Python] Add ORC write support
ARROW-8049 - [C++] Bump thrift to 0.13 and require cmake 3.10 for it
ARROW-8282 - [C++/Python][Dataset] Support schema evolution for integer columns
ARROW-8284 - [C++][Dataset] Schema evolution for timestamp columns
ARROW-8630 - [C++][Dataset] Pass schema including all materialized fields to catch CSV edge cases
ARROW-8631 - [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python
ARROW-8658 - [C++][Dataset] Implement subtree pruning for FileSystemDataset
ARROW-8672 - [Java] Implement RecordBatch IPC buffer compression from ARROW-300
ARROW-8732 - [C++] Add basic cancellation API
ARROW-8771 - [C++] Add boost/process library to build support
ARROW-8796 - [Rust] Allow parquet to be written directly to memory
ARROW-8797 - [C++] Read RecordBatch in a different endian
ARROW-8900 - [C++][Python] Expose Proxy Options as parameters for S3FileSystem
ARROW-8919 - [C++][Compute][Dataset] Add Function::DispatchBest to accomodate implicit casts
ARROW-9128 - [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
ARROW-9149 - [C++] Improve configurability of RandomArrayGenerator::ArrayOf
ARROW-9196 - [C++][Compute] All casts accept scalar and sliced inputs
ARROW-9318 - [C++] Parquet encryption key management
ARROW-9731 - [C++][Python][R][Dataset] Implement Scanner::Head
ARROW-9749 - [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions
ARROW-9777 - [Rust] Implement IPC changes to catch up to 1.0.0 format
ARROW-9856 - [R] Add bindings for string compute functions
ARROW-10014 - [C++] TaskGroup::Finish should execute tasks
ARROW-10089 - [R] inject base class for Array, ChunkedArray and Scalar
ARROW-10183 - [C++] Apply composable futures to CSV
ARROW-10195 - [C++] Add string struct extract kernel using re2
ARROW-10250 - [C++][FlightRPC] Consistently use FlightClientOptions::Defaults
ARROW-10255 - [JS] Reorganize exports for ESM tree-shaking
ARROW-10297 - [Rust] Parameter for parquet-read to output data in json format, add "cli" feature to parquet crate
ARROW-10299 - [Rust] Use IPC Metadata V5 as default
ARROW-10305 - [R] Filter with regular expressions
ARROW-10306 - [C++] Add string replacement kernel
ARROW-10349 - [Python] Build and publish aarch64 wheels
ARROW-10354 - [Rust][DataFusion] regexp_extract function to select regex groups from strings
ARROW-10360 - [CI] Bump Github Actions cache version
ARROW-10372 - [Dataset][C++][Python][R] Support reading compressed CSV
ARROW-10406 - [C++] Unify dictionaries when writing IPC file in a single shot
ARROW-10420 - [C++] Refactor io and filesystem APIs to take an IOContext
ARROW-10421 - [R] Use gc_memory_pool in more places
ARROW-10438 - [C++][Dataset] Partitioning::Format on nulls
ARROW-10520 - [C++][R] Implement add/remove/replace for RecordBatch
ARROW-10570 - [R] Use Converter API to convert SEXP to Array/ChunkedArray
ARROW-10580 - [C++] Disallow non-monotonic dense union offsets
ARROW-10606 - [C++] Implement Decimal256 casts
ARROW-10655 - [C++] Add cache and memoization facility
ARROW-10734 - [R] Build and test on Solaris
ARROW-10735 - [R] Remove arrow-without-arrow wrapping
ARROW-10766 - [Rust][Parquet] Compute nested list definitions
ARROW-10816 - [Rust][DataFusion] Initial support for Interval expressions
ARROW-10831 - [C++][Compute] Implement quantile kernel
ARROW-10846 - [C++] Add async filesystem operations
ARROW-10880 - [Java] Support compressing RecordBatch IPC buffers by LZ4
ARROW-10882 - [Python] Allow writing dataset from iterator of batches
ARROW-10895 - [C++][Gandiva] Implement bool to varchar cast function in Gandiva
ARROW-10903 - [Rust] Implement FromIter<Option<Vec<u8>>> constructor for FixedSizeBinaryArray
ARROW-11022 - [Rust] Upgrade to Tokio 1.0
ARROW-11070 - [C++][Compute] Implement power kernel
ARROW-11074 - [Rust][DataFusion] Implement predicate push-down for parquet tables
ARROW-11081 - [Java] Make IPC option immutable
ARROW-11108 - [Rust] Fixed performance issue in mutableBuffer.
ARROW-11141 - [Rust] Add basic Miri checks to CI pipeline
ARROW-11149 - [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty
ARROW-11150 - [Rust] Add Arrow Rust Community section to Rust README
ARROW-11154 - [CI][C++] Move homebrew crossbow tests off of Travis-CI
ARROW-11156 - [Rust][DataFusion] Create hashes vectorized in hash join
ARROW-11174 - [C++][Dataset] Make expressions available to projection
ARROW-11179 - [Format] Make FB comments friendly to rust
ARROW-11183 - [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
ARROW-11191 - [C++] Use FnOnce for TaskGroup's tasks instead of std::function
ARROW-11216 - [Rust] add doc example for StringDictionaryBuilder
ARROW-11220 - [Rust] Implement GROUP BY support for Boolean
ARROW-11222 - [Rust] Catch up with flatbuffers 0.8.1 which had some UB problems fixed
ARROW-11246 - [Rust] Add type to Unexpected accumulator state error
ARROW-11254 - [Rust][DataFusion] Add SIMD and snmalloc flags as options to benchmarks
ARROW-11260 - [C++][Dataset] Don't require dictionaries when specifying explicit partition schema
ARROW-11265 - [Rust] Made bool not ArrowNativeType
ARROW-11268 - [Rust][DataFusion] MemTable::load output partition support
ARROW-11270 - [Rust] Array slice accessors
ARROW-11279 - [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
ARROW-11284 - [R] Support dplyr verb transmute()
ARROW-11289 - [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns
ARROW-11290 - [Rust][DataFusion] Address hash aggregate performance issue with high number of groups
ARROW-11291 - [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)
ARROW-11300 - [Rust][DataFusion] Further performance improvements on hash aggregation with small groups
ARROW-11308 - [Rust][Parquet] Support decimal when writing parquet files
ARROW-11309 - [Release][C#] Use .NET 3.1 for verification
ARROW-11310 - [Rust] implement JSON writer
ARROW-11314 - [Release][APT][Yum] Add support for verifying arm64 packages
ARROW-11317 - [Rust] Include the prettyprint feature in CI Coverage
ARROW-11318 - [Rust] Support pretty printing timestamp, date, and timestamp types
ARROW-11319 - [Rust][DataFusion] Improve test comparisons to record batch, remove test::format_batch
ARROW-11321 - [Rust][DataFusion] Fix DataFusion compilation error
ARROW-11325 - [Packaging][C#] Release Apache.Arrow.Flight and Apache.Arrow.Flight.AspNetCore
ARROW-11329 - [Rust] Don't rerun build.rs on every file change
ARROW-11330 - [Rust][DataFusion] add ExpressionVisitor to encode expression walking
ARROW-11332 - [Rust] Use MutableBuffer in take_string instead of Vec
ARROW-11333 - [Rust] Generalized creation of empty arrays.
ARROW-11336 - [C++][Doc] Improve Developing on Windows docs
ARROW-11338 - [R] Bindings for quantile and median
ARROW-11340 - [C++] Add vcpkg.json manifest to cpp project root
ARROW-11343 - [Rust][DataFusion] Simplified example with UDF.
ARROW-11346 - [C++][Compute] Implement quantile kernel benchmark
ARROW-11349 - [Rust] Add from_iter_values to create arrays from (non null) values
ARROW-11350 - [C++] Bump dependency versions
ARROW-11354 - [Rust] Speed-up cast of dates and times (2-4x)
ARROW-11355 - [Rust] Aligned Date DataType with specification.
ARROW-11358 - [Rust] Add benchmark for concatenating small arrays
ARROW-11360 - [Rust][DataFusion] Improve CSV "No files found" error message
ARROW-11361 - [Rust] Build MutableBuffer/Buffer from iterator of bools
ARROW-11362 - [Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance
ARROW-11365 - [Rust][Parquet] Logical type printer and parser
ARROW-11366 - [Datafusion] Implement constant folding for boolean literal expressions
ARROW-11367 - [C++] Implement t-digest approximate quantile utility
ARROW-11369 - [DataFusion] Split physical_plan/expressions.rs
ARROW-11372 - [Release] Support RC verification on macOS-ARM64
ARROW-11373 - [Python][Docs] Add example of specifying type for a column when reading csv file
ARROW-11374 - [Python] Make legacy pyarrow.filesystem / pyarrow.serialize warnings more visisble (DeprecationWarning -> FutureWarning)
ARROW-11375 - [Rust] Fix deprecation warning in clippy
ARROW-11377 - [C++][CI] Add Thread Sanitizer nightly build
ARROW-11383 - [Rust] Faster bit AND and OR (2x)
ARROW-11386 - [Release] Fix post documents update script
ARROW-11389 - [Rust] make comments more consistent and fix typos
ARROW-11395 - [DataFusion] Support custom optimizers
ARROW-11401 - [Rust][DataFusion] Pass slices instead of Vec in DataFrame API
ARROW-11404 - [Rust][DataFusion] Upgrade to aHash 0.7 + minor cleanup
ARROW-11405 - [DataFusion] Support multiple custom logical nodes
ARROW-11406 - [CI][C++] Fix ccache caching on Travis-CI
ARROW-11408 - [Rust] Add window support to datafusion readme
ARROW-11411 - [Packaging][Linux] Disable arm64 nightly builds
ARROW-11414 - [Rust] Reduce copies in Schema::try_merge
ARROW-11417 - [Integration] Add integration tests for buffer compression
ARROW-11418 - [Doc] Add buffer compression to IPC support matrix
ARROW-11421 - [Rust][DataFusion] Support GROUP BY Date32
ARROW-11422 - [C#] add decimal support
ARROW-11423 - [R] value_counts and some StructArray methods
ARROW-11425 - [C++][Compute] Optimize quantile kernel for integers
ARROW-11426 - [Rust][DataFusion] EXTRACT support
ARROW-11428 - [Rust] Add power_scalar kernel
ARROW-11429 - Make string comparisson kernels generic over Utf8 and LargeUtf8
ARROW-11430 - [Rust] zip kernel: combine arrays based on boolean mask
ARROW-11431 - [Rust][DataFusion] Support the HAVING clause.
ARROW-11435 - [Datafusion] allow creating ParquetPartition from external crate, make combine_filters public
ARROW-11436 - [Rust] Improved from_iter for primitive arrays (-20-30% for cast)
ARROW-11437 - [Rust] Removed duplicated code in benches
ARROW-11438 - [Rust][DataFusion] Support literal boolean values in DataFusion SQL
ARROW-11439 - [Rust] Add year support to temporal kernels
ARROW-11440 - [Rust][DataFusion] Add method to CsvExec to get CSV schema
ARROW-11442 - [Rust] Expose datetime conversion logic independently
ARROW-11443 - [Rust] Write datetime information for Date64 Type in csv writer
ARROW-11444 - [Rust][DataFusion] Accept slices as parameters
ARROW-11446 - [DataFusion] Added support for scalarValue in Builtin functions.
ARROW-11447 - [Rust] Add shift kernel for primitive types
ARROW-11449 - [CI][R][Windows] Use ccache
ARROW-11457 - [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8
ARROW-11459 - [Rust] Added API to build ListArray of Primitives from an iterator
ARROW-11462 - [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX
ARROW-11463 - [Python] Expose "allow_64bit" to IpcWriteOptions in pyarrow.
ARROW-11466 - [Go][Flight] adding Basic Auth handling for go flight client and server
ARROW-11467 - [R] Fix reference to json_table_reader() in R docs
ARROW-11468 - [R] Allow user to pass schema to read_json_arrow()
ARROW-11474 - [C++] Update bundled re2 version
ARROW-11476 - [Rust][DataFusion] Test running of TPCH benchmarks in CI
ARROW-11477 - [R][Doc] Reorganize and improve README and vignette content
ARROW-11478 - [R] Consider ways to make arrow.skip_nul option more user-friendly
ARROW-11479 - [Rust][Parquet] Add Method to get compressed size of columns from row group metadata
ARROW-11481 - [Rust] More cast implementations
ARROW-11484 - [Rust][DataFusion] Derive Clone for ExecutionContext
ARROW-11486 - [Website] Use Jekyll 4 and webpack to support Ruby 3.0 or later
ARROW-11489 - [Rust][DataFusion] Make DataFrame be Send + Sync
ARROW-11491 - [Rust] support JSON schema inference for nested list and struct
ARROW-11493 - [CI][Packaging][deb][RPM] Test built packages
ARROW-11500 - [R] Allow bundled build script to run on Solaris
ARROW-11501 - [C++] endianness check does not work on Solaris
ARROW-11504 - [Rust] Added checks to List DataType.
ARROW-11505 - [Rust] Add support for LargeUtf8 in csv-writer
ARROW-11507 - [R] Bindings for GetRuntimeInfo
ARROW-11510 - [Python] Add note that pip >= 19.0 is required to get binary packages
ARROW-11511 - [Rust] Replace Arc<ArrayData> by ArrayData in all arrays
ARROW-11512 - [Packaging][deb] Add missing gRPC dependency for Ubuntu 21.04
ARROW-11513 - [R] Bindings for sub/gsub
ARROW-11516 - [R] Allow all C++ compute functions to be called by name in dplyr
ARROW-11539 - [Developer][Archery] Change items_per_seconds units
ARROW-11541 - [C++][Compute] Implement tdigest kernel
ARROW-11542 - [Rust] fix validity bitmap buffer length count in json reader
ARROW-11544 - [Rust][DataFusion] Implement as_any for AggregateExpr
ARROW-11545 - [Rust][DataFusion] SendableRecordBatchStream should implement Sync
ARROW-11556 - [C++] Assorted benchmark-related improvements
ARROW-11557 - [Rust][Datafusion] Add deregister_table
ARROW-11559 - [C++] Add regression file
ARROW-11559 - [C++] Use smarter Flatbuffers verification parameters
ARROW-11561 - [Rust][DataFusion] Add Send + Sync to MemTable::load
ARROW-11563 - [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))
ARROW-11568 - [C++][Compute] Rewrite mode kernel
ARROW-11570 - [Rust] ScalarValue - support Date64
ARROW-11571 - [CI] Cancel stale Github Actions workflow runs
ARROW-11572 - [Rust] Add a kernel for division by single scalar
ARROW-11573 - [Developer][Archery] Google benchmark now reports run type
ARROW-11574 - [Rust][DataFusion] Upgrade sqlparser to support parsing all TPC-H queries
ARROW-11575 - [Developer][Archery] Expose execution time in benchmark results
ARROW-11576 - [Rust] Fix unused variable in Rust code example
ARROW-11580 - [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
ARROW-11589 - [R] Add methods for modifying Schemas
ARROW-11590 - [C++] Move CSV background generator to IO thread pool
ARROW-11591 - [C++][Compute] Grouped aggregation
ARROW-11592 - [Rust] Fix typo in comment
ARROW-11594 - [Rust] Support pretty printing of NullArray
ARROW-11597 - [Rust] Split file in smaller ones.
ARROW-11598 - [Rust] Split buffer.rs in smaller files
ARROW-11599 - [Rust] Add function to create array with all nulls
ARROW-11601 - [C++][Python][Dataset] expose Parquet pre-buffer option
ARROW-11606 - [Rust][DataFusion] Add input schema to HashAggregateExec
ARROW-11610 - [C++] Download boost from sourceforge instead of bintray
ARROW-11611 - [C++] Update third party dependency mirrors
ARROW-11612 - [C++] Rebuild trimmed boost bundle for 1.75.0
ARROW-11613 - [R] Move nightly C++ builds off of bintray
ARROW-11616 - [Rust][DataFusion] Add collect_partitioned on DataFrame
ARROW-11621 - [CI][Gandiva][Linux] Fix Crossbow setup failure
ARROW-11626 - [Rust][DataFusion][DataFusion] examples to own project
ARROW-11627 - [Rust] Make allocator be a generic over type T
ARROW-11637 - [CI][Conda] Update nightly clean target platforms and packages list
ARROW-11641 - [CI] Use docker buildkit's inline cache to reuse build cache across different hosts
ARROW-11649 - [R] Add support for null_fallback to R
ARROW-11651 - [Rust][DataFusion] Implement Postgres String Functions: Length Functions
ARROW-11653 - [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex
ARROW-11655 - [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
ARROW-11656 - [Rust][DataFusion] Remaining Postgres String functions
ARROW-11659 - [R] Preserve group_by .drop argument
ARROW-11662 - [C++] Support sorting decimal and fixed size binary data
ARROW-11664 - [Rust] cast to LargeUtf8
ARROW-11665 - [C++][Python] Improve docstrings for decimal and union types
ARROW-11666 - [Integration] Add endianness "gold" integration file for decimal256
ARROW-11667 - [Rust] Add documentation for utf8 comparison kernels
ARROW-11669 - [Rust][DataFusion] Remove concurrency field from GlobalLimitExec and SortExec
ARROW-11671 - [Rust][DataFusion] Clean up Expr doc comments and examples
ARROW-11677 - [C++][Docs] Add basic C++ datasets documentation
ARROW-11680 - [C++] Add vendored version of folly's spsc queue
ARROW-11683 - [R] Support dplyr::mutate()
ARROW-11685 - [C++] Fix typo: FutureStessTest -> FutureStressTest
ARROW-11688 - [Rust] Casts between Utf8 and LargeUtf8
ARROW-11690 - [Rust][DataFusion] Avoid expr copies while using builder methods
ARROW-11692 - [Rust][DataFusion] Improve OptimizerRule comments
ARROW-11693 - [C++] Add string length kernel
ARROW-11700 - [R] Internationalize error handling in tidy eval
ARROW-11701 - [R] Implement dplyr::relocate()
ARROW-11703 - [R] Implement dplyr::arrange()
ARROW-11704 - [R] Wire up dplyr::mutate() for datasets
ARROW-11707 - [Rust] support CSV schema inference without file IO
ARROW-11708 - [Rust] fix Rust 2021 linting warnings
ARROW-11709 - [Rust][DataFusion] Move expressions and inputs into LogicalPlan ratherthan helpers in util
ARROW-11710 - [Rust][DataFusion] Implement ExpressionRewriter
ARROW-11719 - [Rust][Datafusion] support creating memory table with merged schema
ARROW-11721 - [Rust] json schema inference to return Schema instead of SchemaRef
ARROW-11722 - [Rust] Improve error message in FFI cast.
ARROW-11724 - [C++] Resolve namespace collisions with protobuf 3.15
ARROW-11725 - [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow
ARROW-11727 - [C++][FlightRPC] Estimate latency quantiles with TDigest
ARROW-11730 - [C++] Add implicit convenience constructors for constructing Future from Status/Result
ARROW-11733 - [Rust][DataFusion] Implement hash partitioning
ARROW-11734 - [C++] vendored safe-math.h does not compile on Solaris
ARROW-11735 - [R] Allow Parquet and Arrow Dataset to be optional components
ARROW-11736 - [R] Allow string compute functions to be optional
ARROW-11737 - [C++] Patch vendored xxhash for Solaris
ARROW-11738 - [Rust][DataFusion] Fix Concat and Trim Functions
ARROW-11740 - [C++] posix_memalign not declared in scope on Solaris
ARROW-11742 - [Rust][DataFusion] Add Expr::is_null and Expr::is_not_nu…
ARROW-11744 - [C++] Add xsimd dependency
ARROW-11745 - [C++] Add helper to generate random record batches by schema
ARROW-11750 - [Python][Dataset] Add support for project expressions
ARROW-11752 - [R] Replace usage of testthat::expect_is()
ARROW-11753 - [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved
ARROW-11754 - [R] Support dplyr::compute()
ARROW-11761 - [C++] Increase public API testing
ARROW-11766 - [R] Better handling for missing compression codecs on Linux
ARROW-11768 - [CI][C++] Make s390x job required
ARROW-11773 - [Rust] Support writing well formed JSON arrays as well as newline delimited json streams
ARROW-11774 - [R] macos one line install
ARROW-11775 - [Rust][DataFusion] Feature Flags for Dependencies
ARROW-11777 - [Rust] impl AsRef for StringBuilder/BinaryBuilder
ARROW-11778 - [Rust] Cast from LargeUtf8 to Numerical and temporal types
ARROW-11779 - [Rust] make alloc module public
ARROW-11790 - [Rust][DataFusion][Expr]
ARROW-11794 - [Go] Add concurrent-safe ipc.FileReader.RecordAt(i)
ARROW-11795 - [MATLAB] Migrate MATLAB Interface for Apache Arrow design doc to Markdown
ARROW-11797 - [C++][Dataset] Provide batch stream Scanner methods
ARROW-11798 - [Integration] Update testing submodule
ARROW-11799 - [Rust] fix len of string and binary arrays created from unbound iterator
ARROW-11801 - [C++] Remove bad header guard in filesystem/type_fwd.h
ARROW-11803 - [Rust][Parquet] Support v2 LogicalType
ARROW-11806 - [Rust][DataFusion] Optimize join / inner join creation of indices
ARROW-11820 - [Rust] Added macro to create native types
ARROW-11822 - [Rust][Datafusion] Support case sensitive comparisons for functions and aggregates
ARROW-11824 - [Rust][Parquet] Use logical types in Arrow schema conversion
ARROW-11825 - [Rust][DataFusion] Add mimalloc as option to benchmarks
ARROW-11833 - [C++] Bump vendored fast_float
ARROW-11837 - [C++][Dataset] expose originating Fragment on ScanTask
ARROW-11838 - [C++] Support IPC reads with shared dictionaries.
ARROW-11839 - [C++] Use xsimd for generation of accelerated bit-unpacking
ARROW-11842 - [Rust][Parquet] Use clone_from in get_batch_with_dict
ARROW-11852 - [Docs] Update CONTRIBUTING to explain Contributor role
ARROW-11856 - [C++] Remove unused reference to RecordBatchStreamWriter
ARROW-11858 - [GLib][Gandiva] Add Gandiva::Filter and related functions
ARROW-11859 - [GLib][Ruby] Add garrow_array_concatenate()
ARROW-11861 - [R][Packaging] Apply changes in r/tools/autobrew upstream
ARROW-11864 - [R] Document arrow.int64_downcast option
ARROW-11870 - [Dev] Automatically run merge script in virtual environment
ARROW-11876 - [Website] Update governance page
ARROW-11877 - [C++] Add microbenchmark for SimplifyWithGuarantee
ARROW-11879 - [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan
ARROW-11883 - [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map
ARROW-11887 - [C++] Add asynchronous read to streaming CSV reader
ARROW-11894 - [Rust][DataFusion] Change flight server example to use DataFrame API
ARROW-11895 - [Rust][DataFusion] Add support for more column statistics
ARROW-11898 - [Rust] Pretty print columns
ARROW-11899 - [Java] Refactor the compression codec implementation into core/Arrow specific parts
ARROW-11900 - [Website] Add Yibo to committer list
ARROW-11906 - [R] : Make FeatherReader print method more informative
ARROW-11907 - [C++] Use our own executor in S3FileSystem
ARROW-11910 - [Packaging][Ubuntu] Drop support for 16.04
ARROW-11911 - [Website] Add protobuf vs arrow to FAQ
ARROW-11912 - [R] Remove args from FeatherReader$create
ARROW-11913 - [Rust] Improve performance of StringBuilder by delaying bitmap creation
ARROW-11920 - [R] Remove r/libarrow when make cleaning
ARROW-11921 - [R] Set LC_COLLATE in r/data-raw/codegen.R
ARROW-11924 - [C++] Add streaming version of FileSystem::GetFileInfo
ARROW-11925 - [R] : Add between method for arrow_dplyr_query
ARROW-11927 - [Rust][DataFusion] Support Limit push down optimization
ARROW-11931 - [Go] bump to go1.15
ARROW-11935 - [C++] Add push generator
ARROW-11944 - [Developer] Fix archery's comparison of cached benchmark runs
ARROW-11949 - [Ruby] Accept raw Ruby objects as sort key and options
ARROW-11951 - [Rust] Remove OffsetSize::prefix
ARROW-11952 - [Rust] Make ArrayData --> GenericListArray fallable instead of panic!
ARROW-11954 - [C++] arrow/util/io_util.cc does not compile on Solaris
ARROW-11955 - [Rust][DataFusion] Support Union
ARROW-11958 - [GLib] Add garrow_chunked_array_combine()
ARROW-11959 - [Rust][DataFusion] Fix log line
ARROW-11962 - [Rust][DataFusion] Improve DataFusion docs
ARROW-11969 - [Rust][DataFusion] Improve Examples in documentation
ARROW-11972 - [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options
ARROW-11973 - [Rust][DataFusion] Boolean kleene kernels
ARROW-11977 - [Rust] Add documentation examples for sort kernel
ARROW-11982 - [Rust] Donate Ballista Distributed Compute Platform
ARROW-11984 - [C++][Gandiva] Implement SHA1 and SHA256 functions
ARROW-11987 - [C++][Gandiva] Implement trigonometric functions
ARROW-11988 - [C++][Gandiva] Implements last_day function
ARROW-11992 - [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType
ARROW-11993 - [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
ARROW-11996 - [R] Make r/configure run successfully on Solaris
ARROW-11999 - [Java] Support parallel vector element search with user-specified comparator
ARROW-12000 - [Documentation] Add note about deviation from style guide on struct/classes
ARROW-12005 - [R] Fix a bash typo in configure
ARROW-12017 - [R][Documentation] Make proper developing arrow docs
ARROW-12019 - [Rust][Parquet] Update README for 2.6.0 support
ARROW-12020 - [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion
ARROW-12031 - [C++][CSV] infer CSV timestamps columns with fractional seconds
ARROW-12032 - [Rust] Optimize comparison kernels
ARROW-12034 - [Developer Tools] Formalize Minor PRs
ARROW-12037 - [Rust][DataFusion] Support catalogs and schemas for table namespacing
ARROW-12038 - [Rust][DataFusion] Upgrade hashbrown to 0.11
ARROW-12039 - [Nightly][Gandiva] Fix gandiva-jar-ubuntu nightly build failure
ARROW-12040 - [C++] Fix potential deadlock in recursive S3 walks
ARROW-12043 - [Rust][Parquet] Write FSB arrays
ARROW-12045 - [Go][Parquet] Initial Chunk of Parquet port to Go
ARROW-12047 - [Rust][Parquet] Cleanup clippy
ARROW-12048 - [Rust][DataFusion] Support Common Table Expressions
ARROW-12052 - [Rust] Add Child Data to Arrow's C FFI implementation. …
ARROW-12056 - [C++] Create sequencing AsyncGenerator
ARROW-12058 - [Python] Enable arithmetic operations on Expressions
ARROW-12068 - [Python] Stop using distutils
ARROW-12069 - [C++][Gandiva] Implement IN expressions for Decimal type
ARROW-12070 - [GLib] Drop support for GNU Autotools
ARROW-12071 - [GLib] Keep input stream reference of GArrowJSONReader
ARROW-12075 - [Rust][DataFusion] Add CTE + UNION ALL to supported list of SQL features
ARROW-12081 - [R] Bindings for utf8_length
ARROW-12082 - [R][Dataset] Allow create dataset from vector of file paths
ARROW-12094 - [C++][R] Fix re2 building on clang/libc++
ARROW-12097 - [C++] Modify BackgroundGenerator so it creates fewer threads
ARROW-12098 - [R] Catch cpp build failures on linux
ARROW-12104 - [Go][Parquet] Second chunk of Ported Go Parquet code
ARROW-12106 - [Rust][DataFusion] Support SELECT * from information_schema.tables
ARROW-12107 - [Rust][DataFusion] Support SELECT * from information_schema.columns
ARROW-12108 - [Rust][DataFusion] Implement SHOW TABLES
ARROW-12109 - [Rust][DataFusion] Implement SHOW COLUMNS
ARROW-12110 - [Java] Implement ZSTD compression
ARROW-12111 - [Java] Generate flatbuffer files using flatc 1.12.0
ARROW-12116 - [Rust] Fix and ignore 1.51 clippy lints
ARROW-12119 - [Rust][DataFusion] Improve performance of to_array_of_size for primitives
ARROW-12120 - [Rust] Generate random arrays and batches
ARROW-12121 - [Rust][Parquet] Arrow writer benchmarks
ARROW-12123 - [Rust][DataFusion] Use smallvec for indices for better join performance
ARROW-12128 - [CI][Crossbow] Remove test-ubuntu-16.04-cpp job
ARROW-12131 - [CI][GLib] Ensure upgrading MSYS2
ARROW-12133 - [C++][Gandiva] Add option to disable targeting host cpu during llvm ir compilation
ARROW-12134 - [C++] Add match_substring_regex kernel
ARROW-12136 - [Rust][DataFusion] Reduce default batch_size to 8192
ARROW-12139 - [Python][Packaging] Use vcpkg to build macOS wheels
ARROW-12141 - [R] Bindings for grepl
ARROW-12143 - [CI] R builds should timeout and fail after some threshold and dump the output.
ARROW-12146 - [C++][Gandiva] Implement CONVERT_FROM(expression, replacement char) function
ARROW-12151 - [Docs] Add Jira component + summary conventions to the docs
ARROW-12153 - [Rust][Parquet] Return file stats after writing file
ARROW-12160 - [Rust] Add into_inner() to StreamWriter
ARROW-12164 - [Java] Make BaseAllocator.Config public
ARROW-12165 - [Rust] inline append functions of builders
ARROW-12168 - [Go][IPC] Implement Compression handling for Arrow IPC
ARROW-12170 - [Rust][DataFusion] Introduce repartition optimization
ARROW-12173 - [GLib] Remove #include <config.h>
ARROW-12176 - [C++] Fix some typos of cpp examples
ARROW-12187 - [C++][FlightRPC] Add compression benchmark for stream writing
ARROW-12188 - [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
ARROW-12190 - [Rust][DataFusion] Implement parallel / partitioned hash join
ARROW-12192 - [Website] Use downloadable URL for archive download
ARROW-12193 - [Dev][Release] Use downloadable URL for archive download
ARROW-12194 - [Rust][Parquet] Bump zstd to v0.7
ARROW-12197 - [R] dplyr bindings for cast, dictionary_encode
ARROW-12200 - [R] Export and document list_compute_functions
ARROW-12204 - [Rust][CI] Reduce size of Rust build artifacts in integration test
ARROW-12206 - [Python][Docs] Fix Table docstrings
ARROW-12208 - [C++] Add the ability to run async tasks without using the CPU thread pool
ARROW-12210 - [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / Information Schema
ARROW-12214 - [Rust][DataFusion] Add tests for limit
ARROW-12215 - [C++] Allow null values in fixed-size binary columns read from CSV
ARROW-12217 - [C++] Cleanup cpp examples source files naming
ARROW-12222 - [Dev][Packaging] Include build url in the crossbow console report
ARROW-12224 - [Rust] Use stable rust for no default test, clean up CI tests
ARROW-12228 - [CI] Create base image for conda environments
ARROW-12236 - [R][CI] Add check that all docs pages are listed in _pkgdown.yml
ARROW-12237 - [Packaging][Debian] Add support for bullseye
ARROW-12238 - [JS] Remove trailing spaces and consistently add space after //
ARROW-12239 - [JS] Switch to yarn
ARROW-12242 - [Python][Doc] Tweak nightly build instructions
ARROW-12246 - [CI] Sync conda recipes with upstream feedstock
ARROW-12248 - [C++] Avoid looking up ARROW_DEFAULT_MEMORY_POOL environment variable too late
ARROW-12249 - [R][CI] Fix test-r-install-local nightlies
ARROW-12251 - [Rust] Add Ballista to CI
ARROW-12263 - [Dev][Packaging] Move Crossbow to Archery
ARROW-12269 - [JS] Move to eslint
ARROW-12274 - [JS] Document how to run tests without building bundles
ARROW-12277 - [Rust][DataFusion] Implement Sum/Count/Min/Max aggregates for Timestamp(,)
ARROW-12278 - [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type
ARROW-12280 - [Developer] Remove @-mentions from commit messages in merge tool
ARROW-12281 - [JS] Remove shx, trash, and rimraf and update learna for yarn
ARROW-12283 - [R] Bindings for basic type convert functions in dplyr verbs
ARROW-12286 - [C++] Create AsyncGenerator from Future<AsyncGenerator<T>>
ARROW-12287 - [C++] Create enumerating generator
ARROW-12288 - [C++] Create Scanner interface
ARROW-12289 - [C++] Create basic AsyncScanner implementation
ARROW-12303 - [JS] Use iterator instead of yield
ARROW-12304 - [R] Update news and polish docs for 4.0
ARROW-12305 - [JS] Update generate.py to python3 and new versions of pyarrow
ARROW-12309 - [JS] Make es2015 bundles the default
ARROW-12316 - [C++] Prefer mimalloc on Apple
ARROW-12317 - [Rust] JSON writer support for time, duration and date
ARROW-12320 - [CI] REPO arg missing from conda-cpp-valgrind
ARROW-12323 - [C++][Gandiva] Implement castTIME(timestamp) function
ARROW-12325 - [C++][CI] Nightly gandiva build failing due to failure of compiler to move return value
ARROW-12326 - [C++] Avoid needless c-ares detection
ARROW-12328 - [Rust][Ballista] Fix formatting
ARROW-12329 - [Rust][Ballista] Add Ballista README
ARROW-12332 - [Rust][Ballista] Add simple api server in scheduler
ARROW-12333 - [JS] Remove jest-environment-node-debug and do not emit from typescript by default
ARROW-12335 - [Rust][Ballista] Use latest DataFusion
ARROW-12337 - [Rust] add DoubleEndedIterator and ExactSizeIterator traits
ARROW-12351 - [CI][Ruby] Use ruby/setup-ruby instead of actions/setup-ruby
ARROW-12352 - [CI][R][Windows] Remove needless workaround for MSYS2
ARROW-12353 - [Packaging][deb] Rename -archive-keyring to -apt-source
ARROW-12354 - [Packaging][RPM] Use apache.jfrog.io/artifactory/ instead of apache.bintray.com/
ARROW-12356 - [Website] Update install page instructions to point to artifactory
ARROW-12361 - [Rust][DataFusion] Allow users to override physical optimization rules
ARROW-12367 - [C++] Stop producing when PushGenerator was destroyed
ARROW-12370 - [R] Bindings for power kernel
ARROW-12374 - [CI][C++][cron] Use Ubuntu 20.04 instead of 16.04
ARROW-12375 - [Release] Remove rebase post-release scripts
ARROW-12376 - [Dev] Log traceback for unexpected exceptions in archery trigger-bot
ARROW-12380 - [Rust][Ballista] Basic scheduler ui
ARROW-12381 - [Packaging][Python] macOS wheels are built with wrong package kind
ARROW-12383 - [JS] Upgrade dependencies
ARROW-12384 - [JS] Use let/const and clean up eslint rules
ARROW-12389 - [R][Docs] Add note about autocasting
ARROW-12395 - Create RunInSerialExecutor benchmark
ARROW-12396 - [Python][Docs] Clarify serialization/filesystem docstrings about deprecated status
ARROW-12397 - [Rust][DataFusion] Simplify readme example
ARROW-12398 - [Rust] remove redundant bound check in iterators
ARROW-12400 - [Rust] Re-enable tests in arrow::array::transform
ARROW-12402 - [Rust][DataFusion] Implement SQL metrics example
ARROW-12406 - [R] Fix checkbashism violation in configure
ARROW-12409 - [R] Remove LazyData from DESCRIPTION
ARROW-12419 - [Java] Remove to download flatc binary for s390x
ARROW-12420 - [C++/Dataset] Reading null columns as dictionary not longer possible
ARROW-12423 - [Docs] Remove Codecov badge
ARROW-12425 - [Rust] Fix new_null_array dictionary creation
ARROW-12432 - [Rust][DataFusion] Add metrics to SortExec
ARROW-12436 - [Rust][Ballista] Add watch capabilities to config backend trait
ARROW-12467 - [C++][Gandiva] Add support for LLVM12
ARROW-12477 - [Release] Download aarch64 miniforge
ARROW-12485 - [C++] Use mimalloc as the default memory allocator on macOS
ARROW-12488 - [GLib] Use g_memdup2() with GLib 2.68 or later
ARROW-12494 - [C++] ORC adapter fails to compile on GCC 4.8
ARROW-12506 - [Python] Improve modularity of pyarrow codebase to speedup compile time
ARROW-12652 - disable conda arm64 in nightly
PARQUET-1846 - [C++] Remove deprecated IO classes
PARQUET-1899 - [C++] Deprecated ReadBatchSpaced
PARQUET-1990 - [C++] Refuse to write ConvertedType::NA
PARQUET-1993 - [C++] expose way to wait for I/O to complete

Keywords

FAQs

What is apache-arrow?

Is apache-arrow popular?

Is apache-arrow well maintained?

Package last updated on 27 Apr 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

apache-arrow

What is apache-arrow?

What are apache-arrow's main functionalities?

Other packages similar to apache-arrow

pandas

dask

Install apache-arrow from NPM

Powering Columnar In-Memory Analytics

Get Started

Cookbook

Get a table from an Arrow file on disk (in IPC format)

Create a Table when the Arrow file is split across buffers

Create a Table from JavaScript arrays

Load data with fetch

Columns look like JS Arrays

Usage with MapD Core

Getting involved

Packaging

Why we package like this

People

Powered By Apache Arrow in JS

Open Source Projects

Companies & Organizations

License

Apache Arrow 4.0.0 (2021-04-26)

Bug Fixes

New Features and Improvements

Keywords

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates

Malicious npm Package Exploits WhatsApp Authentication with Remote Kill Switch for File Destruction

Install `apache-arrow` from NPM

Load data with `fetch`