New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

@apache-arrow/esnext-umd

Package Overview
Dependencies
Maintainers
5
Versions
46
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@apache-arrow/esnext-umd - npm Package Versions

1245

7.0.0

Diff

ptaylor
published 6.0.1 •

kszucs
published 6.0.0 •

Changelog

Source

Apache Arrow 6.0.0 (2021-10-26)

Bug Fixes

  • ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
  • ARROW-8452 - [Go] support proper nested nullable flags
  • ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
  • ARROW-8999 - [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
  • ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
  • ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
  • ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
  • ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
  • ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
  • ARROW-11579 - [R] read_feather hanging on Windows
  • ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
  • ARROW-11729 - [R] Add examples to datasets documentation
  • ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
  • ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
  • ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
  • ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
  • ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
  • ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
  • ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
  • ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
  • ARROW-12837 - [C++] Do not crash when printing invalid arrays
  • ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
  • ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
  • ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
  • ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
  • ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
  • ARROW-13336 - [Doc] Make clean in docs should clean generated docs
  • ARROW-13422 - [R] Clarify README about S3 support on Windows
  • ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
  • ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
  • ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
  • ARROW-13430 - [Go] fix handling of zero value for FromBigInt
  • ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
  • ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
  • ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
  • ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
  • ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
  • ARROW-13446 - [Release] Fix verification on amazon linux
  • ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
  • ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
  • ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
  • ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
  • ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
  • ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
  • ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
  • ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
  • ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
  • ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
  • ARROW-13496 - [CI][R] Repair r-sanitizer job
  • ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
  • ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
  • ARROW-13500 - [C++] Fix using '-Wno-unknown-warning-option' with GCC
  • ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
  • ARROW-13507 - [R] LTO job on CRAN fails
  • ARROW-13509 - [C++] Take kernel with empty inputs
  • ARROW-13522 - [C++] Fix regression in UTF8 trim functions
  • ARROW-13523 - [C++] Normalize test executable name
  • ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
  • ARROW-13529 - [Go] Fixing too many releases in IPC writer
  • ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
  • ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
  • ARROW-13556 - [C++] Add protobuf to linking for flight
  • ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
  • ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
  • ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
  • ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
  • ARROW-13600 - [C++] Fix maybe uninitialized warnings
  • ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
  • ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
  • ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
  • ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
  • ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
  • ARROW-13624 - [R] readr short type mapping has T and t backwards
  • ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
  • ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
  • ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
  • ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
  • ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
  • ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
  • ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
  • ARROW-13662 - [CI] Fix failing strftime test with older pandas
  • ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
  • ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
  • ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
  • ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
  • ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
  • ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
  • ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
  • ARROW-13694 - [R] Arrow filter crashes (R aborted session)
  • ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
  • ARROW-13744 - [CI] c++14 and 17 nightly job fails
  • ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
  • ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
  • ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
  • ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
  • ARROW-13786 - [R][CI] Don't fail the RCHK build if arrow doesn't build
  • ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
  • ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
  • ARROW-13799 - [R] case_when error handling is capturing strings
  • ARROW-13800 - [R] Use divide instead of divide_checked
  • ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
  • ARROW-13814 - [CI] Fix Spark master integration tests
  • ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
  • ARROW-13846 - [C++] Fix crashes on invalid IPC file
  • ARROW-13850 - [C++] Fix crashes on invalid Parquet data
  • ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
  • ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
  • ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
  • ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
  • ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
  • ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
  • ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
  • ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
  • ARROW-13882 - [C++] Improve min_max/hash_min_max type support
  • ARROW-13884 - [JS] Move source files into a separate directory
  • ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
  • ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
  • ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
  • ARROW-13916 - [C++] Implement strftime on date32/64 types
  • ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
  • ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
  • ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
  • ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
  • ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
  • ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
  • ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
  • ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
  • ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
  • ARROW-13997 - [C++] restore exec node based query performance
  • ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
  • ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
  • ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
  • ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
  • ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
  • ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
  • ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
  • ARROW-14027 - [C++] Handle scalars in Grouper
  • ARROW-14040 - [C++] Fix result order dependence in scanner test
  • ARROW-14053 - [C++][CSV] Use atomic counter for async tests
  • ARROW-14057 - [C++] Bump aws-c-common version
  • ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
  • ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
  • ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
  • ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
  • ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
  • ARROW-14124 - [R] Timezone support in R <= 3.4
  • ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
  • ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
  • ARROW-14141 - [IR][C++] Join missing from RelationImpl
  • ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
  • ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
  • ARROW-14173 - [IR] Allow typed null literals to be represented
  • ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
  • ARROW-14184 - [C++] allow joins where the keys include new columns on the left
  • ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
  • ARROW-14195 - [R] Fix ExecPlan binding annotations
  • ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
  • ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
  • ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
  • ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
  • ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
  • ARROW-14206 - [Go][CI] Fix build on s390x and ARM
  • ARROW-14208 - [C++] Fix compilation on Windows
  • ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
  • ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
  • ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
  • ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
  • ARROW-14219 - [R][CI] DuckDB valgrind failure
  • ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
  • ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
  • ARROW-14223 - [C++] add missing third-party dependency
  • ARROW-14224 - [C++] Try to reduce build time/memory usage
  • ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
  • ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
  • ARROW-14240 - [C++] Fix wrong nlohmann-json header path
  • ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
  • ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
  • ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
  • ARROW-14252 - [R] Partial matching of arguments warning
  • ARROW-14255 - [Python] Fix FlightClient.do_action
  • ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
  • ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
  • ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
  • ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
  • ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
  • ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
  • ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
  • ARROW-14302 - [C++] Valgrind errors
  • ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
  • ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
  • ARROW-14313 - [Doc] Make Archery installation docs more accurate
  • ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
  • ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
  • ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
  • ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
  • ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
  • ARROW-14381 - [CI][Python] Fix Spark integration failures
  • ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
  • ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
  • ARROW-14393 - [C++] GTest linking errors during the source release verification
  • ARROW-14397 - [C++] Fix valgrind error in test utility
  • ARROW-14406 - [CI] Skip failing test on dask-master nightly build
  • ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
  • ARROW-14417 - [R] Joins ignore projection on left dataset
  • ARROW-14423 - [Python] Fix version constraints in pyproject.toml
  • ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
  • ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
  • ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
  • PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
  • PARQUET-2089 - [C++] Align RowGroup file_offset with specification

New Features and Improvements

  • ARROW-1565 - [C++] Implement TopK/BottomK
  • ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
  • ARROW-4333 - [C++] Sketch out design for kernels and "query" execution in compute layer
  • ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
  • ARROW-5002 - [C++] Implement Hash Aggregation query execution node
  • ARROW-5244 - [C++] Remove experimental marker from some APIs
  • ARROW-6072 - [C++] Implement casting List <-> LargeList
  • ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
  • ARROW-6626 - [Python] Support converting nested sets when converting to arrow
  • ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
  • ARROW-7102 - [Python] Make filesystems compatible with fsspec
  • ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
  • ARROW-7901 - [Go][Integration] enable integration tests for null case
  • ARROW-8022 - [C++] Add static and small vector implementations
  • ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
  • ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
  • ARROW-8621 - [Release] Add post release step to add tags for Go versioning
  • ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
  • ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
  • ARROW-9226 - [Python] Support core-site.xml default filesystem.
  • ARROW-9434 - [C++] Store type code in UnionScalar
  • ARROW-9719 - [Python] Improve HadoopFileSystem docstring
  • ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
  • ARROW-10415 - [R] Support for dplyr::distinct()
  • ARROW-10898 - [C++] Improve table sort performance
  • ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
  • ARROW-11243 - [C++] Recognize time types in CSV files
  • ARROW-11460 - [R] Use system libraries if present on Linux
  • ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
  • ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
  • ARROW-11828 - [C++] Expose CSVWriter object in api
  • ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
  • ARROW-11981 - [C++] Implement Union ExecNode
  • ARROW-12063 - [C++] Add null placement option to sort functions
  • ARROW-12181 - [C++][R] The "CSV dataset" in test-dataset.R is failing on RTools 3.5
  • ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
  • ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
  • ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
  • ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
  • ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
  • ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
  • ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
  • ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
  • ARROW-12657 - [C++] Adding String hex to numeric conversion
  • ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
  • ARROW-12673 - [C++] Add callback to handle incorrect column counts
  • ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
  • ARROW-12714 - [C++] String title case kernel
  • ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
  • ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
  • ARROW-12744 - [C++][Compute] Add rounding kernel
  • ARROW-12759 - [C++][Compute] Add ExecNode for group by
  • ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
  • ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
  • ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
  • ARROW-12871 - [R] upgrade to testthat 3e
  • ARROW-12876 - [R] Fix build flags on Raspberry Pi
  • ARROW-12944 - [C++] String capitalize kernel
  • ARROW-12946 - [C++] String swap case kernel
  • ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
  • ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
  • ARROW-12965 - [Java] C Data Interface implementation
  • ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
  • ARROW-12981 - [R] Install source package from CRAN alone
  • ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
  • ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
  • ARROW-13067 - [C++][Compute] Implement integer to decimal cast
  • ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
  • ARROW-13112 - [R] altrep vectors for strings and other types
  • ARROW-13132 - [C++] Add Scalar validation
  • ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
  • ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
  • ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
  • ARROW-13164 - [R] altrep vectors from Array with nulls
  • ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
  • ARROW-13174 - [C++][Compute] Add strftime kernel
  • ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
  • ARROW-13218 - [Format] Clarify interpretation of timestamp values
  • ARROW-13220 - [C++] Implement 'choose' function
  • ARROW-13222 - [C++] Improve type support for case_when
  • ARROW-13227 - [Documentation][Compute] Document ExecNode
  • ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
  • ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
  • ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
  • ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
  • ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
  • ARROW-13298 - [C++] Implement any/all hash aggregate kernels
  • ARROW-13307 - [C++] Remove reflection-based enums
  • ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
  • ARROW-13317 - [Python] Improve documentation on what 'use_threads' does in 'read_feather'
  • ARROW-13326 - [R][Archery] Add linting to dev CI
  • ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
  • ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
  • ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
  • ARROW-13345 - [C++] Add basic implementation for log to base b
  • ARROW-13358 - [C++] Improve type support in if_else
  • ARROW-13379 - [Dev][Docs] Improvements to archery docs
  • ARROW-13390 - [C++] Implement coalesce for remaining types
  • ARROW-13397 - [R] Update arrow.Rmd vignette
  • ARROW-13399 - [R] Update dataset.Rmd vignette
  • ARROW-13402 - [R] Update flight.Rmd vignette
  • ARROW-13403 - [R] Update developing.Rmd vignette
  • ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
  • ARROW-13405 - [Doc] Guide users to the documentation for their own platform
  • ARROW-13416 - [C++] Implement mod compute function
  • ARROW-13420 - [JS] Update dependencies
  • ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
  • ARROW-13433 - [R] Remove CLI hack from Valgrind test
  • ARROW-13434 - [R] group_by() with an unnammed expression
  • ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
  • ARROW-13444 - [C++] Remove usage of deprecated std::result_of
  • ARROW-13448 - [R] Bindings for strftime
  • ARROW-13453 - [R] DuckDB has not yet released 0.2.8
  • ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
  • ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
  • ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
  • ARROW-13465 - [R] to_arrow() from duckdb
  • ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
  • ARROW-13468 - [Release] Fix binary download/upload failures
  • ARROW-13472 - [R] Remove .engine = "duckdb" argument
  • ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
  • ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
  • ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
  • ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
  • ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
  • ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
  • ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
  • ARROW-13489 - [R] Bump CI jobs after 5.0.0
  • ARROW-13501 - [R] Bindings for count aggregation
  • ARROW-13502 - [R] Bindings for min/max aggregation
  • ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
  • ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
  • ARROW-13508 - [C++] Support custom retry strategies in S3Options
  • ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
  • ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
  • ARROW-13516 - [C++] Detect --version-script flag availability
  • ARROW-13519 - [R] Make doc examples less noisy
  • ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
  • ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
  • ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
  • ARROW-13528 - [R] Bindings for mean, var, sd aggregation
  • ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
  • ARROW-13534 - [C++] Improve csv chunker
  • ARROW-13540 - [C++] Add order by sink node
  • ARROW-13541 - [C++][Python] Implement ExtensionScalar
  • ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
  • ARROW-13548 - [C++] Implement temporal difference kernels
  • ARROW-13549 - [C++] Add casts from timestamp to date/time
  • ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
  • ARROW-13552 - [C++] Remove deprecated APIs
  • ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
  • ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
  • ARROW-13562 - [R] Styler followups
  • ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
  • ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
  • ARROW-13573 - [C++] Support dictionaries natively in case_when
  • ARROW-13574 - [C++] Add 'count all' option to count kernels
  • ARROW-13575 - [C++] Add hash_product kernel
  • ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
  • ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
  • ARROW-13585 - [GLib] Add support for C ABI interface
  • ARROW-13587 - [R] Handle --use-LTO override
  • ARROW-13595 - [C++] Add debug mode check for compute kernel output type
  • ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
  • ARROW-13606 - [R] Actually disable LTO
  • ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
  • ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
  • ARROW-13618 - [R] Use Arrow engine for summarize() by default
  • ARROW-13620 - [R] Binding for n_distinct()
  • ARROW-13626 - [R] Bindings for log base b
  • ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
  • ARROW-13629 - [Ruby] Add support for building/converting map
  • ARROW-13633 - [Packaging][Debian] Add support for bookworm
  • ARROW-13634 - [R] Update distro() in nixlibs.R to map from "bookworm" to 12
  • ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
  • ARROW-13637 - [Python] Fix docstrings
  • ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
  • ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
  • ARROW-13646 - [Go][Parquet] adding the parquet metadata package
  • ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
  • ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
  • ARROW-13651 - [Ruby][Symbol] to Arrow array
  • ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
  • ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
  • ARROW-13670 - [C++] add virtual destructors
  • ARROW-13674 - [CI] PR checks should check for JIRA components
  • ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
  • ARROW-13679 - [GLib][Ruby] Add support for group aggregation
  • ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
  • ARROW-13682 - [C++] Add TDigest API to merge one TDigest
  • ARROW-13684 - [C++][Compute] Strftime kernel follow-up
  • ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
  • ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
  • ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
  • ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
  • ARROW-13696 - [Python] Support for MapType with Fields
  • ARROW-13699 - [Python][Docs] Improve filesystem documentation
  • ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
  • ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
  • ARROW-13704 - [C#] Add support for reading streaming format delta dictionaries
  • ARROW-13705 - [Website] Pin node version
  • ARROW-13721 - [Doc][Cookbook] Specifying Schemas - Python
  • ARROW-13733 - [Java] : Allow JDBC adapters to reuse vector schema roots
  • ARROW-13734 - [Format] Clarify allowed values for time types
  • ARROW-13736 - [C++] Reconcile PrettyPrint and StringFormatter
  • ARROW-13737 - [C++] Support for grouped aggregation over scalar columns
  • ARROW-13739 - [R] Support dplyr::count() and tally()
  • ARROW-13740 - [R] summarize() should not eagerly evaluate
  • ARROW-13757 - [R] Fix download of C++ source for CRAN patch releases
  • ARROW-13759 - [C++] Update linting and formatting scripts to specify python3 in shebang line
  • ARROW-13760 - [C++] Bump required Protobuf when using Flight
  • ARROW-13764 - [C++] Support CountOptions in grouped count distinct
  • ARROW-13768 - [R] Allow JSON to be an optional component
  • ARROW-13772 - [R] Binding for median aggregation
  • ARROW-13776 - [C++] Offline thirdparty versions.txt is missing extensions for some files
  • ARROW-13777 - [R] mutate after group_by should be ok as long as there are only scalar functions
  • ARROW-13778 - [R] Handle complex summarize expressions
  • ARROW-13782 - [C++] Add skip_nulls/min_count to tdigest/mode/quantile
  • ARROW-13783 - . [Python] Preview data when printing tables
  • ARROW-13785 - [C++] Add methods to print exec nodes/plans
  • ARROW-13787 - [C++] Verify third-party downloads
  • ARROW-13789 - [Go] Implement Scalar Values for Go
  • ARROW-13793 - [C++] Migrate ORCFileReader to Result<T>
  • ARROW-13794 - [C++] Deprecate PARQUET_VERSION_2_0
  • ARROW-13797 - [C++][Python] Column projection pushdown for ORC dataset reading + use liborc for column selection
  • ARROW-13803 - [C++] Don't read past end of buffer in BitUtil::SetBitmap
  • ARROW-13804 - [Go] Add Interval type Month, Day, Nano
  • ARROW-13806 - [C++][Python] Add support for new MonthDayNano Interval Type
  • ARROW-13809 - [C++][ABI] Add support for MonthDayNanoInterval to C ABI
  • ARROW-13810 - [C++][Compute] Predicate IsAsciiCharacter allows invalid types and values
  • ARROW-13815 - [R] : Adapt to new callstack changes in rlang
  • ARROW-13816 - [Go][C] Implement Consumer APIs for C Data Interface in Go
  • ARROW-13820 - [R] Rename na.min_count to min_count and na.rm to skip_nulls
  • ARROW-13821 - [R] Handle na.rm in sd, var bindings
  • ARROW-13823 - [Java] : Exclude .factorypath
  • ARROW-13824 - [C++][Compute] Make constexpr BooleanToNumber kernel
  • ARROW-13831 - [GLib][Ruby] Add support for writing by Arrow Dataset
  • ARROW-13835 - [Doc][Python] Add documentation for unify_schemas
  • ARROW-13842 - [C++] Bump vendored date library
  • ARROW-13843 - [C++][CI] Exercise ToString / PrettyPrint in fuzzing setup
  • ARROW-13845 - [C++] Reconcile RandomArrayGenerator::ArrayOf implementations
  • ARROW-13847 - [Java] Avoid unnecessary collection copies
  • ARROW-13849 - [C++] Wrap min_max with min/max functions
  • ARROW-13852 - [R] Handle Dataset schema metadata in ExecPlan
  • ARROW-13853 - [R] String to_title, to_lower, to_upper kernels
  • ARROW-13855 - [C++][Python] Implement C data interface support for extension types
  • ARROW-13857 - [R][CI] Remove checkbashisms download
  • ARROW-13859 - [Java] Add code coverage support
  • ARROW-13866 - [R] Implement Options for all compute kernels available via list_compute_functions
  • ARROW-13869 - [R] Implement options for non-bound MatchSubstringOptions kernels
  • ARROW-13871 - [C++] JSON reader can fail if a list array key is present in one chunk but not in a later chunk
  • ARROW-13874 - [R] Implement TrimOptions
  • ARROW-13883 - [Python] Allow more than numpy.array as masks when creating arrays
  • ARROW-13890 - [R] Split up test-dataset.R and test-dplyr.R
  • ARROW-13893 - [R] Make head/tail lazy on datasets and queries
  • ARROW-13897 - [Python] Correct TimestampScalar.as_py() and DurationScalar.as_py() docstrings
  • ARROW-13898 - [C++][Compute] Add support for string binary transforms
  • ARROW-13899 - [Ruby] Implement slicer by compute kernels
  • ARROW-13901 - [R] Implement IndexOptions
  • ARROW-13904 - [R] Implement ModeOptions
  • ARROW-13905 - [R] Implement ReplaceSliceOptions
  • ARROW-13906 - [R] Implement PartitionNthOptions
  • ARROW-13908 - [R] Implement ExtractRegexOptions
  • ARROW-13909 - [GLib] Add tests for GArrowVarianceOptions
  • ARROW-13909 - [GLib] Add GArrowVarianceOptions
  • ARROW-13910 - [Ruby] accepts Range and selectors
  • ARROW-13919 - [GLib] Add GArrowFunctionDoc
  • ARROW-13924 - [R] Bindings for stringr::str_starts, stringr::str_ends, base::startsWith and base::endsWith
  • ARROW-13925 - [R] Remove system installation devdocs jobs
  • ARROW-13927 - [R] Add Karl to the contributors list for the pacakge
  • ARROW-13928 - [R] Rename the version(s) tasks so that it's clearer which is which
  • ARROW-13937 - [C++][Compute] Add explicit output values to sign function and fix unary type checks
  • ARROW-13942 - [Dev] Update cmake_format usage in autotune comment bot
  • ARROW-13944 - [C++] Bump xsimd to latest version
  • ARROW-13958 - [Python] Migrate Python ORC bindings to use new Result-based APIs
  • ARROW-13959 - [R] Update tests for extracting components from date32 objects
  • ARROW-13962 - [R] Catch up on the NEWS
  • ARROW-13963 - [Go] Minor: Add bitmap reader/writer impl from go Parquet module to Arrow Bitutil package
  • ARROW-13964 - MINOR: [Go][Parquet] remove base bitmap reader/writer from parquet module, use arrow bitutil ones
  • ARROW-13965 - [C++] dynamic_casts in parquet TypedColumnWriterImpl impacting performance
  • ARROW-13966 - [C++] Support decimals in comparisons
  • ARROW-13967 - [Go] Implement Concatenate function for array.Interface
  • ARROW-13973 - [C++] Add a SelectKSinkNode
  • ARROW-13974 - [C++] Resolve follow-up reviews for TopK/BottomK
  • ARROW-13975 - [C++] Implement decimal round
  • ARROW-13977 - [Format] clarify leap seconds for interval type
  • ARROW-13979 - [Go] Enable -race for go tests
  • ARROW-13990 - [R] Bindings for round kernels
  • ARROW-13994 - [Doc][C++] Build document misses git submodule update
  • ARROW-13995 - [R] Bindings for join node
  • ARROW-13999 - [C++] Fix bundled LZ4 build on MinGW
  • ARROW-14002 - [Python] Support tuples in unify_schemas
  • ARROW-14003 - [C++][Python] Not providing a sort_key in the "select_k_unstable" kernel crashes
  • ARROW-14005 - [R] Fix tests for PartitionNthOptions so that can run on various platformsFix partition_nth_indices test
  • ARROW-14006 - [C++][Python] Support cast of naive timestamps to strings
  • ARROW-14007 - [C++] Fix compiler warnings in decimal promotion helper
  • ARROW-14008 - [R][Compute] Running an ExecPlan should yield Reader instead of Table
  • ARROW-14009 - [C++] Seed parallellism in SourceNode
  • ARROW-14012 - [Python] Update kernel categories in compute doc to match C++
  • ARROW-14013 - [C++][Docs] Add instructions for Fedora
  • ARROW-14016 - [C++] Wrong type_name used for directory partitioning
  • ARROW-14019 - [R] expect_dplyr_equal() test helper function ignores grouping
  • ARROW-14023 - [Ruby] Arrow::Table#slice accepts Hash
  • ARROW-14025 - [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
  • ARROW-14030 - [GLib] Use arrow::Result based ORC API
  • ARROW-14031 - [Ruby] Use min and max separately
  • ARROW-14033 - [Ruby] Append OpenSSL's .pc path automatically on macOS with Homebrew
  • ARROW-14033 - [Ruby][Doc] Add macOS development guide for Red Arrow
  • ARROW-14035 - [C++][Python][R] Implement count distinct kernel
  • ARROW-14036 - [R] Binding for n_distinct() with no grouping
  • ARROW-14043 - [Python] Allow unsigned integer index type in dictionary() type factory function
  • ARROW-14044 - [R] Handle group_by .drop parameter in summarize
  • ARROW-14049 - [C++][Java] Upgrade ORC to 1.7.0
  • ARROW-14050 - [C++] Make TDigest/Quantile kernels return nulls instead
  • ARROW-14052 - [C++] Add approximate_median aggregation
  • ARROW-14054 - [C++][Docs] Simplify C++ row conversion example
  • ARROW-14055 - [Docs] Add canonical url to the sphinx docs
  • ARROW-14056 - [Doc][C++] Document ArrayData
  • ARROW-14061 - [Go][C++] Add Cgo Arrow Memory Pool Allocator
  • ARROW-14062 - [Format] Initial arrow-internal specification of compute IR
  • ARROW-14064 - [CI] Use Debian 11
  • ARROW-14069 - [R] By default, filter out hash functions in list_compute_functions()
  • ARROW-14070 - [C++][CI] Remove support for VS2015
  • ARROW-14072 - [GLib][Parquet] Add gparquet_arrow_file_reader_get_n_rows()
  • ARROW-14073 - [C++] Deduplicate sort keys
  • ARROW-14084 - [GLib][Ruby][Dataset] Add support for scanning from directory
  • ARROW-14088 - [GLib][Ruby][Dataset] Add support for filter
  • ARROW-14106 - [Go][C] Implement Exporting to the C Data Interface
  • ARROW-14107 - [R][CI] Parallelize Windows CI jobs
  • ARROW-14111 - [C++] Add extraction function support for time32/time64
  • ARROW-14116 - [C++][Docs] Consistent variable names in WriteCSV example
  • ARROW-14127 - [C++][Docs] Example of using compute function and output
  • ARROW-14128 - [Go] Implement MakeArrayFromScalar for nested types
  • ARROW-14132 - [C++] Improve CSV chunker tests
  • ARROW-14135 - [Python] Missing Python tests for compute kernels
  • ARROW-14140 - [R] skip arrow_binary/arrow_large_binary class from R metadata
  • ARROW-14143 - [IR][C++] Add explicit cast node to IR
  • ARROW-14146 - [Dev] Update merge script to specify python3 in shebang line
  • ARROW-14150 - [C++] Don't check delimiter in CSV chunker if no quoting
  • ARROW-14155 - [Go] add fingerprint and hash functions for types and scalars
  • ARROW-14157 - [C++] Refactor Abseil to its own macro
  • ARROW-14165 - [C++] Improve table sort performance
  • ARROW-14178 - [C++] Boost download location has moved
  • ARROW-14180 - [Packaging] Add support for AlmaLinux 8
  • ARROW-14191 - [C++][Dataset] Dataset writes should respect backpressure
  • ARROW-14194 - [Docs] Improve vertical spacing in the sphinx C++ API docs
  • ARROW-14198 - [Java] Upgrade netty, grpc, and boringssl dependencies
  • ARROW-14207 - [C++] Add missing dependencies for bundled Boost targets
  • ARROW-14212 - [GLib][Ruby] Add GArrowTableConcatenateOptions
  • ARROW-14217 - [Python][CI] Add support for python 3.10
  • ARROW-14222 - [C++] implement GCSFileSystem skeleton
  • ARROW-14228 - [R] Allow for creation of nullable fields
  • ARROW-14230 - [C++] Deprecate ArrayBuilder::Advance
  • ARROW-14232 - [C++] update crc32c to version 1.1.2
  • ARROW-14235 - [C++][Compute] Use a node counter as the label if no label is supplied
  • ARROW-14236 - [C++] Add GCS testbench for testing
  • ARROW-14239 - [R] Don't use rlang::as_label
  • ARROW-14241 - [C++][Java][CI] Fix java-jars build
  • ARROW-14243 - [C++] Split vector_sort.cc
  • ARROW-14244 - [C++] Reduce scalar_temporal.cc compilation time
  • ARROW-14258 - [R] Warn if an SF column is made into a table
  • ARROW-14259 - [R] converting from R vector to Array when the R vector is altrep
  • ARROW-14261 - [C++] Includes should be in alphabetical order
  • ARROW-14269 - [C++] Consolidate utf8 benchmark
  • ARROW-14274 - [C++] Refine base64 api
  • ARROW-14284 - [C++][Python] Improve error message when trying use SyncScanner when requiring async
  • ARROW-14291 - [CI][C++] Add cpp/examples/ files to lint targets
  • ARROW-14295 - [Doc] Indicate location of archery
  • ARROW-14296 - [Go] Update generated flatbuf
  • ARROW-14304 - [R] Update news for 6.0.0
  • ARROW-14309 - [Python] Extend CompressedInputStream to work with paths, strings and files
  • ARROW-14317 - [Doc] Update C data interface implementation status
  • ARROW-14326 - [Docs] Add C/GLib and Ruby to C Data/Stream interface supported libraries
  • ARROW-14327 - [Release] Remove conda-* from packaging group
  • ARROW-14335 - [GLib][Ruby] Add support for expression
  • ARROW-14337 - [C++] Arrow doesn't build on M1 when SIMD acceleration is enabled
  • ARROW-14341 - [C++] Improve decimal benchmark
  • ARROW-14343 - [Packaging][Python] Enable NEON SIMD optimization for M1 wheels
  • ARROW-14345 - [C++] Implement streaming reads
  • ARROW-14348 - [R] add group_vars.RecordBatchReader method
  • ARROW-14349 - [IR] Remove RelBase
  • ARROW-14358 - [Doc] Update CMake options in documentation
  • ARROW-14361 - [C++] Add default simd level
  • ARROW-14364 - [CI][C++] Support LLVM 13
  • ARROW-14368 - [CI] Use ubuntu-latest for Azure Pipelines
  • ARROW-14369 - [C++][Python] Use std::move() explicitly for g++ 4.8.5
  • ARROW-14386 - [Packaging][Java] Ensure using installed devtoolset version
  • ARROW-14387 - [Release][Ruby] Check Homebrew/MSYS2 package version before releasing
  • ARROW-14396 - [R][Doc] Remove relic note in write_dataset that columns cannot be renamed
  • ARROW-14400 - [Go] Equals and ApproxEquals for Tables and Chunked Arrays
  • ARROW-14401 - [C++] Fix bundled crc32c's include path
  • ARROW-14402 - [Release][Yum] Specify gpg path explicitly
  • ARROW-14404 - [Release][APT] Skip arm64 Debian GNU/Linux bookwarm verification
  • ARROW-14408 - [Packaging][Crossbow] Option for skipping artifact pattern validation
  • ARROW-14410 - [Python][Packaging] Use numpy 1.21.3 to build python 3.10 wheels for macOS and windows
  • ARROW-14452 - [Release][JS] Update Javascript testing
  • ARROW-14511 - [Website][Rust] Rust 6.0.0 release blog post
  • PARQUET-490 - [C++][Parquet] Basic support for reading DELTA_BINARY_PACKED data
kszucs
published 5.0.0 •

Changelog

Source

Apache Arrow 5.0.0 (2021-07-28)

Bug Fixes

  • ARROW-6189 - [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values
  • ARROW-6312 - [C++] Add support for "pkg-config --static arrow"
  • ARROW-7948 - [Go] Decimal128 Integration fix
  • ARROW-9594 - [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas
  • ARROW-10910 - [Python] Provide better error message when trying to read from None source
  • ARROW-10958 - [GLib] "Nested data conversions not implemented" through glib, but not through pyarrow
  • ARROW-11077 - [Rust] ParquetFileArrowReader panicks when trying to read nested list
  • ARROW-11146 - [CI] Remove test-conda-python-3.8-jpype build
  • ARROW-11161 - [C++][Python] Add stream metadata
  • ARROW-11633 - [CI][Doc] Maven default skin not found
  • ARROW-11780 - [Python] Avoid crashing when a ChunkedArray is provided to StructArray.from_arrays()
  • ARROW-11908 - [Rust] Intermittent Flight integration test failures
  • ARROW-12007 - [C++] Loading parquet file returns "Invalid UTF8 payload" error
  • ARROW-12055 - [R] is.na() evaluates to FALSE on Arrow NaN values
  • ARROW-12096 - [C++] Allows users to define arrow timestamp unit for Parquet INT96 timestamp
  • ARROW-12122 - [Python] Cannot install via pip M1 mac
  • ARROW-12142 - [Python][Doc] Mention the CXX ABI flag in the docs
  • ARROW-12150 - [Python] Correctly infer type of mixed-precision Decimals
  • ARROW-12232 - [Rust][Datafusion] Error with CAST: Unsupported SQL type Time
  • ARROW-12240 - [Python] Fix invalid-offsetof warning
  • ARROW-12377 - [Doc][Java] Java doc build broken
  • ARROW-12407 - [Python][Dataset] Remove ScanTask bindings
  • ARROW-12431 - [Python] Mask is inverted when creating FixedSizeBinaryArray
  • ARROW-12472 - [Python] Properly convert paths to strings (using fspath)
  • ARROW-12482 - [Doc][C++][Python] Mention CSVStreamingReader pitfalls with type inference
  • ARROW-12491 - [Packaging][RPM] Add support for Amazon Linux 2
  • ARROW-12503 - [C++] Ensure using "lib/" for jemalloc's library directory
  • ARROW-12508 - [R] expect_as_vector implementation causes test failure on R <= 3.3 & variables defined outside of test_that break build when no arrow install
  • ARROW-12543 - [CI][Python] Fix test-conda-python-3.9 build (gdb version conflict)
  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12569 - [R][CI] Run revdep in CI
  • ARROW-12570 - [JS] Fix issues that blocked the v4.0.0 release
  • ARROW-12579 - [Python] Pyarrow 4.0.0 dependency numpy 1.19.4 throws errors on Apple silicon/M1 compilation
  • ARROW-12589 - [C++] Compiling on windows doesn't work when -DARROW_WITH_BACKTRACE=OFF
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12605 - [Documentation] Update line numbers in cpp/dataset.rst
  • ARROW-12606 - [C++][Compute] Fix Quantile and Mode on arrays with offset
  • ARROW-12610 - [C++] Skip TestS3FSGeneric TestDeleteDir and TestDeleteDirContents on Windows as they are flaky
  • ARROW-12611 - [CI][Python] Add different numpy versions to pandas nightly builds
  • ARROW-12613 - [Python] Support comparison to None in Scalar values
  • ARROW-12614 - [C++][Compute] Remove support for Tables in ExecuteScalarExpression
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12620 - [C++][Dataset] Fix projection during writing
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12630 - [Dev][Integration] conda-integration docker build fails
  • ARROW-12639 - [CI][Archery] Archery build fails to create branch
  • ARROW-12640 - [C++] Fix errors from VS 2019 in cpp/src/parquet/types.h
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12644 - [C++][Python][R][Dataset] URL-decode path segments in partitioning
  • ARROW-12646 - [C++][CI][Packaging][Python] Bump vcpkg version to its latest release
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12668 - [C++][Dataset] Fix segfault in CountRows
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12672 - [C++] Fix fill_null kernel to set null_count + cast kernel to handle no-bitmap with unknown null_count case
  • ARROW-12679 - [Java] JDBC->Arrow for NOT NULL columns.
  • ARROW-12684 - [Go][Flight] fix nil pointer dereference, add test.
  • ARROW-12708 - [C++] Valgrind errors when calling negate_checked
  • ARROW-12729 - [R] Fix length method for Table, RecordBatch
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12756 - [C++] MSVC build fails with latest gtest from vcpkg
  • ARROW-12757 - [Archery] Fix spurious warning when running "archery docker run"
  • ARROW-12762 - [Python] Preserve field name when pickling list types
  • ARROW-12769 - [Python] Fix slicing array with "negative" length (start > stop)
  • ARROW-12771 - [C++][Compute] Fix MaybeReserve parameter in the Consume function of GroupedCountImpl
  • ARROW-12772 - [CI] Merge script test fails due to missing dependency
  • ARROW-12773 - [Docs] Clarify Java support for ORC and Parquet via JNI bindings
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12779 - [Python][FlightRPC] Guard against DoGet handler that never sends data
  • ARROW-12780 - [CI][C++] Install necessary packages for MinGW builds
  • ARROW-12790 - [C++] Improve HadoopFileSystem conformance
  • ARROW-12793 - [Python] Fix support for pyarrow debug builds
  • ARROW-12797 - [JS] Update readme with new links and remove outdated examples
  • ARROW-12798 - [JS] Use == null Comparison
  • ARROW-12799 - [JS] Use Nullish Coalescing Operator (??) For Defaults
  • ARROW-12804 - [C++] Return expected result for IsNull and IsValid for NullArray
  • ARROW-12807 - [C++] Fix build errors in IPC reader
  • ARROW-12838 - [Java][Gandiva] Fix JNI CI test
  • ARROW-12842 - [FlightRPC][Java] Fix sending trailers using CallStatus
  • ARROW-12850 - [R] is.nan() evaluates to null on Arrow null values
  • ARROW-12854 - [Dev][Release] Windows wheel verification script fails to download artifacts
  • ARROW-12857 - [C++] Fix build of hash_aggregate_test
  • ARROW-12864 - [C++] Remove needless out argument from arrow::internal::InvertBitmap
  • ARROW-12865 - [C++][FlightRPC] Link gRPC with RE2
  • ARROW-12882 - [C++][Gandiva] Fix behavior of the convert replace function on gandiva
  • ARROW-12887 - [CI] AppVeyor SSL certificate issue
  • ARROW-12906 - [C++][Python] Fix fill_null segfault
  • ARROW-12907 - [Java] Fix memory leak on deserialization errors
  • ARROW-12911 - [Python] Export scalar aggregate options to pc.sum
  • ARROW-12917 - [C++] Fix handling of decimal types with negative scale in C data import
  • ARROW-12918 - [C++] Fill out iterator_traits<ArrayIterator>
  • ARROW-12919 - [Dev][Archery] Crossbow comment bot failing to react to comments
  • ARROW-12935 - [C++][CI] Fix compiler error on some clang versions
  • ARROW-12941 - [C++] Add rows skipped to rows seen
  • ARROW-12942 - [C++][Compute] Fix incorrect result of Arrow compute hash_min_max with a chunked array
  • ARROW-12956 - [C++] Fix crash on Parquet file (OSS-Fuzz)
  • ARROW-12969 - [C++] Fix match_substring with empty haystack
  • ARROW-12974 - [R] test-r-without-arrow build fails because of example requiring Arrow
  • ARROW-12983 - [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion
  • ARROW-12987 - [C++][CI] Switch to bundled utf8proc with version 2.2 in Ubuntu 18.04 images
  • ARROW-12988 - [CI][Python] Revert skip of failing test in kartothek nightly integration build
  • ARROW-12988 - [CI] Skip the failing test in kartothek nightly integration build
  • ARROW-12989 - [CI] Avoid aggressive cancellation of the "Dev PR" workflow
  • ARROW-12991 - [CI] Migrate Travis-CI ARM job to "arm64-graviton2" arch
  • ARROW-12993 - [Python] Avoid half-initialized FeatherReader object
  • ARROW-12995 - [C++] Add validation to CSV options
  • ARROW-12998 - [C++] Add dataset->toolchain dependency
  • ARROW-13001 - [Go][Parquet] fix build failure on s390x
  • ARROW-13003 - [C++] Fix key map unaligned access
  • ARROW-13008 - [C++] Avoid deprecated API in minimal example
  • ARROW-13010 - [C++][Compute] Support outputting to slices from kleene kernels
  • ARROW-13018 - [C++][Docs] Use consistent terminology for nulls (min_count) in scalar aggregate kernels
  • ARROW-13026 - [CI] Use LLVM 10 for s390x
  • ARROW-13037 - [R] Incorrect param when creating Expression crashes R
  • ARROW-13039 - [R] Fix error message handling
  • ARROW-13041 - [C++] Ensure unary kernels zero-initialize data behind null entries
  • ARROW-13046 - [Release] JS package failing test prior to publish
  • ARROW-13048 - [C++] Fix copying objects with special characters on S3FS
  • ARROW-13053 - [Python] Fix build issue with Homebrewed arrow library
  • ARROW-13069 - [Website] Add Daniël to committer list
  • ARROW-13073 - [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'
  • ARROW-13080 - [Release] Generate the API docs in ubuntu 20.10
  • ARROW-13083 - [Python] Wrong SCM version detection both in setup.py and crossbow
  • ARROW-13085 - [Python] Document compatible toolchains for python bindings
  • ARROW-13090 - [Python] Fix create_dir() implementation in FSSpecHandler
  • ARROW-13104 - [C++] Fix unsafe cast in ByteStreamSplit implementation
  • ARROW-13108 - [Python] Pyarrow 4.0.0 crashes upon import on macOS 10.13.6
  • ARROW-13116 - [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to missing dependencies
  • ARROW-13125 - [R] Throw error when 2+ args passed to desc() in arrange()
  • ARROW-13128 - [C#] TimestampArray conversion logic for nano and micro is wrong
  • ARROW-13135 - [C++] Fix Status propagation from Parquet exception
  • ARROW-13139 - [C++] ReadaheadGenerator cannot be safely copied/moved
  • ARROW-13145 - [C++][CI] Flight test crashes on MinGW
  • ARROW-13148 - [Dev][Archery] Fix crossbow job submission
  • ARROW-13153 - [C++] parquet_dataset loses ordering of files in _metadata
  • ARROW-13154 - [C++] Remove the undocumented type_code <= 125 restriction in union types
  • ARROW-13169 - [C++][Compute] Fix array offset support in GrouperFastImpl
  • ARROW-13173 - [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
  • ARROW-13187 - [Python] Avoid creating reference cycle when reading CSV file
  • ARROW-13189 - [R] Disable row-level metadata application on datasets
  • ARROW-13203 - [R] Fix optional component checks causing failures
  • ARROW-13207 - [Python][Doc] Dataset documentation still suggests deprecated scan method as the preferred iterative approach
  • ARROW-13216 - [R] Type checks test fails with rtools35
  • ARROW-13217 - [C++][Gandiva] Correct error on convert replace function for initial invalid bytes
  • ARROW-13223 - [C++] Fix Thread Sanitizer test failures
  • ARROW-13225 - [Go][FlightRPC][Integration] Implement Flight Custom Middleware and Integration Tests for Go
  • ARROW-13229 - [Python] ascii_trim, ascii_ltrim and ascii_rtrim lack options
  • ARROW-13239 - [Python][Doc] Expose signatures in pyx modules
  • ARROW-13243 - [R] altrep function call in R 3.5
  • ARROW-13246 - [C++] Using CSV skip_rows_after_names can cause data to be discarded prematurely
  • ARROW-13249 - [Java][CI] Consistent timeout in the Java JNI build
  • ARROW-13253 - [FlightRPC][C++] Fix segfault with large messages
  • ARROW-13254 - [Python] Processes killed and semaphore objects leaked when reading pandas data
  • ARROW-13265 - [R] cli valgrind errors in nightlies
  • ARROW-13266 - [JS] Improve benchmark names & include suite name in json
  • ARROW-13281 - [C++][Gandiva] Correct error on timestampDiffMonth function
  • ARROW-13284 - [C++] Fix wrong pkg_check_modules() option name
  • ARROW-13288 - [Python] Missing default values of kernel options in PyArrow
  • ARROW-13290 - [C++] Add missing include
  • ARROW-13305 - [C++] Unable to install nightly on Ubuntu 21.04 due to CSV options
  • ARROW-13315 - [R] Wrap r_task_group includes with ARROW_R_WITH_ARROW checking
  • ARROW-13321 - - [C++][Python] MakeArrayFromScalar doesn't work for FixedSizeBinaryType
  • ARROW-13324 - [R] Typo in bindings for utf8_reverse and ascii_reverse
  • ARROW-13332 - [C++] TSAN failure in TestAsyncUtil.ReadaheadFailed
  • ARROW-13341 - [C++][Compute] Fix race condition in ScalarAggregateNode
  • ARROW-13350 - [Python][CI] Fix test_extract_datetime_components for pandas 0.24
  • ARROW-13352 - [C++] Make sure scalar case_when fully initializes output
  • ARROW-13353 - [Docs] Pin breathe to avoid failure parsing template parameters
  • ARROW-13360 - [C++] Missing dependencies in cpp thirdparty offline dependencies versions.txt
  • ARROW-13363 - [R] is.nan() errors on non-floating point data
  • ARROW-13368 - [C++][Doc] Rename project to make_struct in docs
  • ARROW-13381 - [C++] ArrayFromJSON doesn't work for float value dictionary type
  • ARROW-13382 - [C++] Avoid multiple definitions of same symbol
  • ARROW-13384 - [C++] Specify minimum required zstd version in cmake
  • ARROW-13391 - [CSV] Correct row and column number to error messages with CSV streaming reader
  • ARROW-13417 - [C++] The merged generator can sometimes pull from source sync-reentrant
  • ARROW-13419 - [JS] Fix perf tests
  • ARROW-13428 - [C++][Flight] Add missing -lssl with bundled gRPC and system shared OpenSSL
  • ARROW-13431 - [Release] Bump go version to 1.15; don't verify rust source anymore
  • ARROW-13432 - [Release] Fix ssh connection to the binary uploader container

New Features and Improvements

  • ARROW-2665 - [C++][Python] Add index() kernel
  • ARROW-3014 - [C++] Minimal writer adapter for ORC file format
  • ARROW-3316 - [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
  • ARROW-5385 - [Go] Implement EXTENSION datatype
  • ARROW-5640 - [Go] Implement Arrow Map Array
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-7001 - [C++] Develop threading APIs to accommodate nested parallelism
  • ARROW-7114 - [JS][CI] Enable NodeJS tests for Windows
  • ARROW-7252 - [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation
  • ARROW-7396 - [Format] Register media types (MIME types) for Apache Arrow formats to IANA
  • ARROW-8421 - [Rust] [Parquet] Implement parquet writer
  • ARROW-8459 - [Dev][Archery] Use a more recent cmake-format
  • ARROW-8527 - [C++][CSV] Add support for ReadOptions::skip_rows >= block_size
  • ARROW-8655 - [C++][Python] Preserve partitioning information for a discovered Dataset
  • ARROW-8676 - [Rust] Create implementation of IPC RecordBatch body buffer compression from ARROW-300
  • ARROW-9054 - [C++] Add ScalarAggregateOptions
  • ARROW-9056 - [C++] Support aggregations over scalars
  • ARROW-9140 - [R] Zero-copy Arrow to R where possible
  • ARROW-9295 - [Archery] Support rust clippy in the lint command
  • ARROW-9299 - [C++][Python] Expose ORC metadata
  • ARROW-9313 - [Rust] Use feature enum
  • ARROW-9421 - [C++][Parquet] Redundancies SchemaManifest::GetFieldIndices
  • ARROW-9430 - [C++] Implement replace_with_mask kernel
  • ARROW-9697 - [C++][Python][R][Dataset] Add CountRows for Scanner
  • ARROW-10031 - [CI][Java] Support Java benchmark in Archery
  • ARROW-10115 - [C++] Add CSV option to treat quoted strings as always non-null
  • ARROW-10316 - [Python] Improve introspection of compute function options
  • ARROW-10391 - [Rust] [Parquet] Nested Arrow reader
  • ARROW-10440 - [C++][Dataset] Visit FileWriters before Finish
  • ARROW-10550 - [Rust] [Parquet] Write nested types (struct, list)
  • ARROW-10557 - [C++] Add scalar string slicing/substring extract kernel
  • ARROW-10640 - [C++] A, "if_else" ("where") kernel to combine two arrays based on a mask
  • ARROW-10658 - [Python][Packaging] Wheel builds for Apple Silicon
  • ARROW-10675 - [C++][Python] Support AWS S3 Web identity credentials
  • ARROW-10797 - [C++] Vendor and use PCG random generator library
  • ARROW-10926 - [Rust] Add parquet reader / writer for decimal types
  • ARROW-10959 - [C++] Add scalar string join kernel
  • ARROW-11061 - [Rust] Validate array properties against schema
  • ARROW-11173 - [Java] Add map type in complex reader / writer
  • ARROW-11199 - [C++][Python] Fix the unit tests for the ORC reader
  • ARROW-11206 - [C++][Compute][Python] Rename 'project' to 'make_struct'
  • ARROW-11342 - [Python][Gandiva] Expose ToString and result type information
  • ARROW-11499 - [Release] Use Artifactory instead of Bintray
  • ARROW-11514 - [R][C++] Bindings for paste(), paste0(), str_c()
  • ARROW-11515 - [R] Bindings for strsplit
  • ARROW-11565 - [C++][Gandiva] Modify upper()/lower() to work with UTF8 and add INIT_CAP function
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11608 - [CI] Fix turbodbc nightly
  • ARROW-11660 - [C++] Move RecordBatch::SelectColumns method from R to C++ library
  • ARROW-11673 - - [C++] Casting dictionary type to use different index type
  • ARROW-11675 - [CI][C++] Resolve ctest failures on VS 2019 builds
  • ARROW-11705 - [R] Support scalar value recycling in RecordBatch/Table$create()
  • ARROW-11759 - [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type
  • ARROW-11769 - [R] Pull groups from grouped_df into RecordBatch or Table
  • ARROW-11772 - [C++] Provide reentrant IPC file reader
  • ARROW-11782 - [GLib][Ruby][Dataset] Remove bindings for internal classes
  • ARROW-11787 - [R] Implement write csv
  • ARROW-11843 - [C++] Provide async Parquet reader
  • ARROW-11849 - [R] Use roxygen @examplesIf
  • ARROW-11889 - [C++] Add parallelism to streaming CSV reader
  • ARROW-11909 - [C++] Remove MakeIteratorGenerator
  • ARROW-11926 - [R] Add ucrt64 binaries and fix CI
  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-11928 - [C++] Execution engine API
  • ARROW-11929 - [C++][Dataset][Compute] Promote expression to the compute namespace
  • ARROW-11930 - [C++][Dataset][Compute] Use an ExecPlan for dataset scans
  • ARROW-11932 - [C++] Provide ArrayBuilder::AppendScalar
  • ARROW-11950 - [C++][Compute] Add unary negative kernel
  • ARROW-11960 - [C++][Gandiva] Support escape in LIKE
  • ARROW-11980 - [Python] Remove experimental status from Table.replace_schema_metadata
  • ARROW-11986 - [C++][Gandiva] Implement IN expressions for doubles and floats
  • ARROW-11990 - [C++][Compute] Handle errors consistently
  • ARROW-12004 - [C++] Resultdetail::Empty is annoying
  • ARROW-12010 - [C++][Compute] Improve performance of the hash table used in GroupIdentifier
  • ARROW-12016 - [C++] Implement array_sort_indices and sort_indices for BOOL type
  • ARROW-12050 - [C++][Python][FlightRPC] Make Flight operations interruptible in Python
  • ARROW-12074 - [C++][Compute] Add scalar arithmetic kernels for decimal
  • ARROW-12083 - [C++][Dataset] Use given column types when determining CSV fragment schema
  • ARROW-12092 - [R] Make expect_dplyr_equal() a bit stricter
  • ARROW-12166 - [C++][Gandiva] Implements CONVERT_TO(value, type) function
  • ARROW-12184 - [R] Bindings for na.fail, na.omit, na.exclude, na.pass
  • ARROW-12185 - [R] Bindings for any, all
  • ARROW-12198 - [R] bindings for strptime
  • ARROW-12199 - [R] bindings for stddev, variance
  • ARROW-12205 - [C++][Gandiva][number][number] seconds) function
  • ARROW-12231 - [C++][Python][Dataset] Isolate one-shot data to scanner
  • ARROW-12253 - [Rust] [Ballista] Implement scalable joins
  • ARROW-12255 - [Rust] [Ballista] Integrate scheduler with DataFusion
  • ARROW-12256 - [Rust] [Ballista] Add DataFrame support
  • ARROW-12257 - [Rust] [Ballista] Publish user guide to Arrow site
  • ARROW-12261 - [Rust] [Ballista] Ballista should not have its own DataFrame API
  • ARROW-12291 - [R] Determine the type of an unevaluated expression
  • ARROW-12310 - [Java] ValueVector#getObject should support covariance for complex types
  • ARROW-12355 - [C++] Implement efficient async CSV scanning
  • ARROW-12362 - [Rust] [DataFusion] topk_query test failure
  • ARROW-12364 - [Python][Dataset] Add metadata_collector option to ds.write_dataset()
  • ARROW-12378 - [C++][Gandiva] Implement castVARBINARY functions
  • ARROW-12386 - [C++] Support file parallelism in AsyncScanner
  • ARROW-12391 - [Rust][DataFusion] Implement date_trunc() function
  • ARROW-12392 - [C++] Restore asynchronous streaming CSV reader
  • ARROW-12393 - [JS] Use closure compiler for all UMD targets
  • ARROW-12403 - [Rust] [Ballista] Integration tests should check that query results are correct
  • ARROW-12415 - [CI][Python] Failed building wheel for pygit2 on ARM64
  • ARROW-12424 - [Go][Parquet] Adding Schema Package for Go Parquet
  • ARROW-12428 - [Python] Expose pre_buffer in pyarrow.parquet
  • ARROW-12434 - [Rust] [Ballista] Show executed plans with metrics
  • ARROW-12442 - [CI] Set job timeouts on GitHub Actions
  • ARROW-12443 - [C++][Gandiva] Implement castVARCHAR function for varbinary input
  • ARROW-12444 - [Rust] Remove rust
  • ARROW-12445 - [Rust] Design and implement packaging process to bundle Rust in signed tar
  • ARROW-12468 - [Python][R] Expose ScannerBuilder::UseAsync to Python & R
  • ARROW-12478 - [C++] Support LLVM 12
  • ARROW-12484 - [CI] Change jinja macros to not require CROSSBOW_TOKEN to upload artifacts in Github Actions
  • ARROW-12489 - [Developer] autotune is broken
  • ARROW-12490 - [Dev] Use only miniforge in verify-release-candidate.sh
  • ARROW-12492 - [Python] Helper method to decode DictionaryArray back to Array
  • ARROW-12496 - [C++][Dataset] Ensure AsyncScanner is covered by all scanner tests
  • ARROW-12499 - [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
  • ARROW-12500 - [C++][Datasets] Ensure better test coverage of Dataset file formats
  • ARROW-12501 - [CI][Ruby] Remove needless workaround for MinGW build
  • ARROW-12507 - [CI] Remove duplicated cron/nightly builds
  • ARROW-12512 - [C++][Python][Dataset] Create CSV writer class and add Datasets support
  • ARROW-12514 - [Release] Don't run Gandiva related Ruby test with ARROW_GANDIVA=OFF
  • ARROW-12517 - [Go][Flight] Expose app metadata in flight client and server
  • ARROW-12518 - [Python] Expose Parquet statistics has_null_count / has_distinct_count
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12522 - [C++] Add ReadRangeCache::WaitFor
  • ARROW-12525 - [JS] Vector toJSON() returns an array
  • ARROW-12527 - [Dev] Don't try getting JIRA information for MINOR PR
  • ARROW-12528 - [JS] Support typed arrays in Table.new
  • ARROW-12530 - [C++] Remove Buffer::mutable_data_
  • ARROW-12533 - [C++] Add random real distribution function
  • ARROW-12534 - [C++][Gandiva] Implement LEFT and RIGHT functions on Gandiva for string input values
  • ARROW-12537 - [JS] Docs build should not include test sources
  • ARROW-12541 - [Docs] Improve styling/readability of tables in the new doc theme
  • ARROW-12551 - [Java][Release] Java post-release tests fail due to missing testing data
  • ARROW-12554 - [C++] Allow duplicates in SetLookupOptions::value_set
  • ARROW-12555 - [Java][Release] Java post-release script misses dataset JNI bindings
  • ARROW-12556 - [C++][Gandiva] Implement BYTESUBSTRING function on Gandiva
  • ARROW-12560 - [C++] Add scheduling option for Future callbacks
  • ARROW-12567 - [C++][Gandiva] Implement ILIKE SQL function
  • ARROW-12567 - [C++][Gandiva] Implement LPAD and RPAD functions for string input values
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12575 - [R] Use unary negative kernel
  • ARROW-12577 - [Website] Use Artifactory instead of Bintray in all places
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12581 - [C++][FlightRPC] Allow benchmarking DoPut with a data file
  • ARROW-12584 - [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators
  • ARROW-12591 - [Java][Gandiva] Create single Gandiva jar for MacOS and Linux
  • ARROW-12593 - [Packaging][Ubuntu] Add support for Ubuntu 21.04
  • ARROW-12597 - [C++] Enable per-row-group parallelism in async Parquet reader
  • ARROW-12598 - [C++][Dataset] Speed up CountRows for CSV
  • ARROW-12599 - [Doc][Python] Documentation missing for pyarrow.Table
  • ARROW-12600 - [CI] Push docker images from crossbow tasks
  • ARROW-12602 - [R] Add BuildInfo from C++ to arrow_info
  • ARROW-12608 - [C++][Python][R] Add split_pattern_regex kernel
  • ARROW-12612 - [C++] Add Expression to type_fwd.h
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12621 - [C++][Gandiva] Add alias to sha1 and sha256 functions
  • ARROW-12631 - [Python] Accept Scanner in pyarrow.dataset.write_dataset
  • ARROW-12643 - [Governance] Added experimental repos guidelines.
  • ARROW-12645 - [Python] Fix numpydoc validation
  • ARROW-12648 - [C++][FlightRPC] Enable TLS for Flight benchmark
  • ARROW-12649 - [Python/Packaging] Move conda-aarch64 to Azure with cross-compilation
  • ARROW-12653 - [Archery] allow me to add a comment to crossbow requests
  • ARROW-12658 - [C++] Bump aws-c-common to v0.5.10
  • ARROW-12660 - [R] Post-4.0 adjustments for CRAN
  • ARROW-12661 - [C++] Add ReaderOptions::skip_rows_after_names
  • ARROW-12662 - [Website] Force to use squash merge
  • ARROW-12667 - [Python] Add a more complete test for strided numpy array conversion
  • ARROW-12675 - [C++] CSV parsing report row on which error occurred
  • ARROW-12677 - [Python] Add a mask argument to pyarrow.StructArray.from_arrays
  • ARROW-12685 - [C++][Compute] Add unary absolute value kernel
  • ARROW-12686 - [C++][Python][FlightRPC] Convert Flight reader into a regular reader
  • ARROW-12687 - [C++][Python][Dataset] Convert Scanner into a RecordBatchReader
  • ARROW-12689 - [R] Implement ArrowArrayStream C interface
  • ARROW-12692 - [R] Improve tests and comments for strsplit() bindings
  • ARROW-12694 - [C++] Fix segfault under RTools35 toolchain
  • ARROW-12696 - [R] Improve testing of error messages converted to warnings
  • ARROW-12699 - [CI][Packaging][Java] Generate a jar compatible with Linux and MacOS for all Arrow components
  • ARROW-12702 - [JS] Update webpack and terser
  • ARROW-12703 - [JS] Separate Table from DataFrame
  • ARROW-12704 - [JS] Support and use optional chaining
  • ARROW-12709 - [C++] Add binary_join_element_wise
  • ARROW-12713 - [C++] String reverse kernel
  • ARROW-12715 - [C++][Python] Add SQL LIKE match kernel
  • ARROW-12716 - [C++] Add string padding kernel
  • ARROW-12717 - [C++][Python] Add find_substring kernel
  • ARROW-12719 - [C++] Allow passing S3 canned ACL as output stream metadata
  • ARROW-12721 - [CI] Fix path for uploading aarch64 conda artifacts from the nightly builds
  • ARROW-12722 - [R] Raise error when attemping to print table with duplicated naming
  • ARROW-12730 - [MATLAB] Update featherreadmex and featherwritemex to build against latest Arrow C++ APIs
  • ARROW-12731 - [R] Use InMemoryDataset for Table/RecordBatch in dplyr code
  • ARROW-12736 - [C++] Eliminate forced copy of potentially large vector<shared_ptr<>>
  • ARROW-12738 - [C++/Python/R] Update conda variant files
  • ARROW-12741 - [CI] Configure Crossbow GitHub Token for Nightly Builds
  • ARROW-12745 - [C++][Compute] Add floor, ceiling, and truncate kernels
  • ARROW-12749 - [C++] Construct RecordBatch/Table/Schema with rvalue arguments
  • ARROW-12750 - [CI][R] Actually pass parameterized docker options to the templates
  • ARROW-12751 - [C++] Implement minimum/maximum kernels
  • ARROW-12758 - [R] Add examples to more function documentation
  • ARROW-12760 - [C++][Python][R] Allow setting I/O thread pool size
  • ARROW-12761 - [R] Better error handling for write_to_raw
  • ARROW-12764 - [CI] Support wildcard expansion when uploading crossbow artifacts
  • ARROW-12777 - [R] Convert all inputs to Arrow objects in match_arrow and is_in
  • ARROW-12781 - [R] Implement is.type() functions for dplyr
  • ARROW-12785 - [CI] the r-devdocs build errors when brew installing gcc
  • ARROW-12791 - [R] Better error handling for DatasetFactory$Finish() when no format specified
  • ARROW-12796 - [JS] Support JSON output from benchmarks
  • ARROW-12800 - [JS] Remove text encoder and decoder polyfills
  • ARROW-12801 - [CI][Packaging][Java] Include all modules in script that generate Arrow jars
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-12808 - [JS] Document browser support
  • ARROW-12810 - [Python] Stop AWS SDK from looking for metadata service
  • ARROW-12812 - [Packaging][Java] Improve JNI jars build
  • ARROW-12824 - [R][CI] Upgrade builds for R 4.1 release
  • ARROW-12827 - [C++] Improve error message for dataset discovery failure
  • ARROW-12829 - [GLib][Ruby] Add support for Apache Arrow Flight
  • ARROW-12831 - [CI][macOS] Remove needless Homebrew workaround
  • ARROW-12832 - [JS] Write benchmarks in TypeScript
  • ARROW-12833 - [JS] Construct perf data in JS
  • ARROW-12835 - [C++][Python][R] Implement case-insensitive match using RE2
  • ARROW-12836 - [C++] Add support for newer IBM i
  • ARROW-12841 - [R] Add examples to more function documentation - part 2
  • ARROW-12843 - [C++][R] Implement is_inf kernel
  • ARROW-12848 - [Release] Fix URLs in vote mail template
  • ARROW-12851 - [Go][Parquet] Add Golang Parquet encoding package
  • ARROW-12856 - [C++][Gandiva] Implement castBIT and castBOOLEAN functions
  • ARROW-12859 - [C++] Add ScalarFromJSON for testing
  • ARROW-12861 - [C++][Compute] Add sign function kernels
  • ARROW-12867 - [R] Bindings for abs()
  • ARROW-12868 - [R] Bindings for find_substring and find_substring_regex
  • ARROW-12869 - [R] Bindings for utf8_reverse and ascii_reverse
  • ARROW-12870 - [R] Bindings for stringr::str_like
  • ARROW-12875 - [JS] Upgrade Jest and other minor updates
  • ARROW-12883 - [R][CI] version compatibility fails on R 4.1
  • ARROW-12891 - [C++] Move subtree pruning to compute
  • ARROW-12894 - [R] Bump R version
  • ARROW-12895 - [CI] Use "concurrency" setting on Github Actions to cancel stale jobs
  • ARROW-12898 - [Release][C#] Fix package upload
  • ARROW-12900 - [Python][Doc] Add missing numpy import
  • ARROW-12901 - [R] Follow on to more examples
  • ARROW-12909 - [R][Release] Build of ubuntu-docs is failing
  • ARROW-12912 - [Website] Use .asf.yaml for publishing
  • ARROW-12915 - [Release] Build of ubuntu-docs is failing on thrift
  • ARROW-12936 - [C++][Gandiva] Implement ASCII Hive function on Gandiva
  • ARROW-12937 - [C++][Python] Allow setting default metadata for new S3 files
  • ARROW-12939 - [R] Simplify RTask stop handling
  • ARROW-12940 - [R] Expose C interface as R6 methods
  • ARROW-12948 - [C++][Python] Add slice_replace kernel
  • ARROW-12949 - [C++] Add starts_with and ends_with
  • ARROW-12950 - [C++] Add count_substring kernel
  • ARROW-12951 - [C++] Reduce generated code size for string kernels
  • ARROW-12952 - [C++] Add count_substring_regex
  • ARROW-12955 - [C++] Add additional type support for if_else kernel
  • ARROW-12957 - [R] rchk issues on cran
  • ARROW-12961 - [Python] Fix MSVC warning building PyArrow
  • ARROW-12962 - [GLib][Ruby] Add Arrow::Scalar
  • ARROW-12964 - [R] Add bindings for ifelse() and if_else()
  • ARROW-12966 - [Python] Expose element_wise_min/max and options in Python
  • ARROW-12967 - [R] Add bindings for pmin() and pmax()
  • ARROW-12968 - [R][CI] Add an rchk job to our nightlies
  • ARROW-12972 - [CI] Fix centos-8 cmake error
  • ARROW-12975 - [C++][Python] if_else kernel doesn't support upcasting
  • ARROW-12982 - [C++] Re-enable unused-variable warning
  • ARROW-12984 - [C++][Compute] Passing options parameter of Count/Index aggregation by reference
  • ARROW-12985 - [Python][Packaging] Unable to install pygit2 in the arm64 wheel builds
  • ARROW-12986 - [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva
  • ARROW-12992 - [R] bindings for substr(), substring(), str_sub()
  • ARROW-12994 - [R] Fix tests that assume UTC local tz
  • ARROW-12996 - Add bytes_read() to StreamingReader
  • ARROW-13002 - [C++] Add a check for the utf8proc's version in CMake
  • ARROW-13005 - [C++] Add support for take implementation on dense union type
  • ARROW-13006 - [C++][Gandiva] Implement BASE64 and UNBASE64 Hive functions on Gandiva
  • ARROW-13009 - [Doc][Dev] Document builds mailing-list
  • ARROW-13022 - [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions
  • ARROW-13025 - [C++][Python] Add FunctionOptions::Equals/ToString/Serialize
  • ARROW-13027 - [C++] Fix ASAN stack traces in CI
  • ARROW-13030 - [CI][Go] Setup Arm64 golang CI
  • ARROW-13031 - [JS] Support arm in closure compiler on macOS
  • ARROW-13032 - [Java] Update guava version
  • ARROW-13034 - [Python][Docs] Update the cloud examples on the Parquet doc page
  • ARROW-13036 - [Doc] Mention recommended file extension(s) for Arrow IPC
  • ARROW-13042 - [C++] Check that kernel output is fully initialized
  • ARROW-13043 - [GLib][Ruby] Add GArrowEqualOptions
  • ARROW-13044 - [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector
  • ARROW-13045 - [Packaging][RPM][deb] Don't install system utf8proc if it's old
  • ARROW-13047 - [Website] Add kiszk to committer list
  • ARROW-13049 - [C++][Gandiva] Implement BIN Hive function on Gandiva
  • ARROW-13050 - [C++][Gandiva] Implement SPACE Hive function on Gandiva
  • ARROW-13054 - [C++] Add option to specify the first day of the week for the "day_of_week" temporal kernel
  • ARROW-13064 - [C++] Implement select ('case when') function for fixed-width types
  • ARROW-13065 - [Packaging][RPM] Add missing required LZ4 version information
  • ARROW-13068 - [GLib][Dataset] Change prefix to gdataset_ from gad_
  • ARROW-13070 - [R] bindings for sd and var
  • ARROW-13072 - [C++] Add bit-wise arithmetic kernels
  • ARROW-13074 - [Python] Deprecate ParquetDataset custom properties (eg pieces, partitions)
  • ARROW-13075 - [Python] Expose C data interface API for pyarrow.Field
  • ARROW-13076 - [Java] Allow ExtensionTypeVector with Struct or Union vector storage
  • ARROW-13082 - [CI] Forward R argument to ubuntu-docs build
  • ARROW-13086 - [Python] De-duplicate time unit conversion code
  • ARROW-13086 - [Python] Expose Parquet ArrowReaderProperties::coerce_int96_timestamp_unit_
  • ARROW-13091 - [Python] Add compression_level argument to IpcWriteOptions constructor
  • ARROW-13092 - [C++] Return an error in CreateDir if target is a file
  • ARROW-13095 - [C++] Implement trig compute functions
  • ARROW-13096 - [C++] Implement logarithm compute functions
  • ARROW-13097 - [C++] Provide simple reflection utility
  • ARROW-13098 - [Dev][Archery] Reorganize docker submodule to its own subpackage
  • ARROW-13100 - [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
  • ARROW-13101 - [Python][Doc] pyarrow.FixedSizeListArray does not appear in the documentation
  • ARROW-13110 - [C++] Deadlock can happen when using BackgroundGenerator without transferring callbacks
  • ARROW-13113 - [R] use RTasks to manage parallel in converting arrow to R
  • ARROW-13117 - [R] Retain schema in new Expressions
  • ARROW-13119 - [R] Set empty schema in scalar Expressions
  • ARROW-13124 - [Ruby] Add support for memory view
  • ARROW-13127 - [R] Valgrind nightly errors
  • ARROW-13136 - [C++] Add coalesce function
  • ARROW-13137 - [C++][Documentation] Make in-table references consistent
  • ARROW-13140 - [C++/Python] Upgrade libthrift pin in the nightlies
  • ARROW-13142 - [Python] Use vector append when converting from list of non-strided numpy arrays
  • ARROW-13147 - [Java] Respect the rounding policy when allocating vector buffers
  • ARROW-13157 - [C++][Python] Add find_substring_regex kernel and implement ignore_case for find_substring
  • ARROW-13158 - [Python] Fix StructScalar contains and repr with duplicate field names
  • ARROW-13162 - [C++][Gandiva] Add new alias for extract date functions in registry
  • ARROW-13171 - [R] Add binding for str_pad()
  • ARROW-13190 - [C++][Gandiva] Change behavior of INITCAP function
  • ARROW-13194 - [Java][Document] Create prose document about Java algorithms
  • ARROW-13195 - [R] Problem with rlang reverse dependency checks
  • ARROW-13199 - [R] add ubuntu 21.04 to nightly builds
  • ARROW-13200 - [R] Add binding for case_when()
  • ARROW-13201 - [R] Add binding for coalesce()
  • ARROW-13210 - [Python][CI] Fix vcpkg caching mechanism for the macOS wheels
  • ARROW-13211 - [C++][CI] Remove outdated Github Actions ARM builds
  • ARROW-13212 - [Release] Support deploying to test PyPI in the python post release script
  • ARROW-13215 - [R][CI] Add ENV TZ to docker files
  • ARROW-13218 - [Doc] Document/clarify conventions for timestamp storage
  • ARROW-13219 - [C++][GLib] Demote/deprecate CompareOptions
  • ARROW-13224 - [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
  • ARROW-13226 - [Python] Add a general purpose cython trampolining utility
  • ARROW-13228 - [C++] S3 CreateBucket fails because AWS treats us-east-1 differently than other regions
  • ARROW-13230 - [Docs][Python] Add CSV writer docs
  • ARROW-13234 - [C++] Put extra padding spaces on the right
  • ARROW-13235 - [C++][Python] Simplify mapping of function options
  • ARROW-13236 - [Python] Include options class name in repr
  • ARROW-13238 - [C++][Compute][Dataset] Use an ExecPlan for dataset scans
  • ARROW-13242 - [C++] Improve random generation of decimal arrays
  • ARROW-13244 - [C++] Add facility to get current thread id as uint64
  • ARROW-13258 - [Python] Improve the repr of ParquetFileFragment
  • ARROW-13262 - [R] transmute() fails after pulling data into R
  • ARROW-13273 - [C++] Don't use .pc only in CMake paths for Requires.private
  • ARROW-13274 - [JS] Remove Webpack
  • ARROW-13275 - [JS] : Fix perf tests
  • ARROW-13276 - [GLib][Ruby][Flight] Add support for ListFlights
  • ARROW-13277 - [JS] Add declaration maps for TypeScript and refactor testing infrastructure
  • ARROW-13280 - [R] Bindings for log and trig functions
  • ARROW-13282 - [C++] Remove obsolete generated files
  • ARROW-13283 - [Archery][Dev] Support passing CPU/memory limits to Docker
  • ARROW-13286 - [CI] Require docker-compose 1.27.0 or later
  • ARROW-13289 - [C++] Accept integer args in trig/log functions via promotion to double
  • ARROW-13291 - [GLib][CI] Require gobject-introspection 3.4.5 or later
  • ARROW-13296 - [C++] Provide a reflection compatible enum replacement
  • ARROW-13299 - [JS] Upgrade ix and rxjs
  • ARROW-13303 - [JS] Revise bundles
  • ARROW-13306 - [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
  • ARROW-13313 - [C++][Compute] Add scalar aggregate node
  • ARROW-13320 - [Website] Add MIME types to FAQ
  • ARROW-13323 - [Archery] Validate docker compose configuration
  • ARROW-13343 - [R] Update NEWS.md for 5.0
  • ARROW-13346 - [C++] Remove compile time parsing from EnumType
  • ARROW-13355 - [R] ensure that sf is installed in our revdep job
  • ARROW-13357 - [R] bindings for sign()
  • ARROW-13365 - [R] bindings for floor/ceiling/truncate
  • ARROW-13385 - [C++] Demonstrate registering compute functions out-of-tree
  • ARROW-13386 - [R][C++] CSV streaming changes break Rtools 35 32-bit build
  • ARROW-13418 - [R] typo in python.r
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • PARQUET-1798 - [C++] Review logic around automatic assignment of field_id's
  • PARQUET-1998 - [C++] Implement LZ4_RAW compression
  • PARQUET-2056 - [C++] Add ability for retrieving dictionary and indices separately for ColumnReader
kou
published 4.0.1 •

Changelog

Source

Apache Arrow 4.0.1 (2021-05-26)

Bug Fixes

  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12603 - [C++][Dataset] Backport fix for specifying CSV column types (#10344)
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12769 - [Python] Fix slicing array with "negative" length (start > stop)
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12855 - error: no member named 'TableReader' in namespace during compilation

New Features and Improvements

  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-13533 - Buy Yellow Xanax Bars R039 | Buy Yellow Xanax Bars 2mg Online With Creditcard
ptaylor
published 4.0.0 •

Changelog

Source

Apache Arrow 4.0.0 (2021-04-26)

Bug Fixes

  • ARROW-4784 - [C++][CI] Re-enable flaky mingw tests.
  • ARROW-6818 - [DOC] Remove reference to Apache Drill design docs
  • ARROW-7288 - [C++][Parquet] Don't use regular expression to parse application version
  • ARROW-7830 - [C++][Parquet] Use Arrow version number for parquet
  • ARROW-9451 - [Python] Refuse implicit cast of str to unsigned integer
  • ARROW-9634 - [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow
  • ARROW-9878 - [Python] Document caveats of to_pandas(self_destruct=True)
  • ARROW-10038 - [C++] Spawn thread pool threads lazily
  • ARROW-10056 - [C++] Increase flatbuffers max_tables parameter in order to read wide tables
  • ARROW-10364 - [Dev][Archery] Add support for semver 2.13.0
  • ARROW-10370 - [Python] Clean-up filesystem handling in write_dataset
  • ARROW-10403 - [C++] Implement unique kernel for non-uniform chunked dictionary arrays
  • ARROW-10405 - [C++] IsIn kernel should be able to lookup dictionary in string
  • ARROW-10457 - [CI] Fix Spark integration tests with branch-3.0
  • ARROW-10489 - [C++] Add Intel C++ compiler options for different warning levels
  • ARROW-10514 - [C++][Parquet] Make the column name the same for both output formats of parquet reader
  • ARROW-10953 - [R] Validate when creating Table with schema
  • ARROW-11066 - [FlightRPC][Java] Make zero-copy writes a configurable option
  • ARROW-11066 - [FlightRPC][Java] Revert "fix zero-copy optimization"
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11066 - Revert "ARROW-11066: [Java][FlightRPC] fix zero-copy opt…
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11134 - [CI][C++] Always run tests on Travis-CI
  • ARROW-11147 - [CI][Python] Remove pandas=0.25.3 pin for dask-latest
  • ARROW-11180 - [Developer] cmake-format pre-commit hook doesn't run
  • ARROW-11192 - [Documentation] Describe opening Visual Studio so it inherits a working env
  • ARROW-11223 - [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflow
  • ARROW-11235 - [Python] Fix test failure inside non-default S3 region
  • ARROW-11239 - [Rust] Fixed equality with offsets and nulls
  • ARROW-11269 - [Rust][Parquet] Preserve timezone in int96 reader
  • ARROW-11277 - [C++] Workaround macOS 10.11: don't default construct consts
  • ARROW-11299 - [Python] Fix invalid-offsetof warnings
  • ARROW-11303 - [Release][C++] Enable mimalloc in the windows verification script
  • ARROW-11305 - Skip first argument (which is the program name) in parquet-rowcount binary
  • ARROW-11311 - [Rust] Fixed unset_bit
  • ARROW-11313 - [Rust] Fixed size_hint
  • ARROW-11315 - [Packaging][APT][arm64] Add missing gir1.2 files
  • ARROW-11320 - [C++] Try to strengthen temporary dir creation
  • ARROW-11322 - [Rust] Re-opening memory module as public
  • ARROW-11323 - [Rust][DataFusion] Allow sort queries to return no results
  • ARROW-11328 - [R] Collecting zero columns from a dataset returns entire dataset
  • ARROW-11334 - [Python][CI] Fix failing pandas nightly tests
  • ARROW-11337 - [C++] Compilation error with ThreadSanitizer
  • ARROW-11357 - [Rust] : Fix out-of-bounds reads in take and other undefined behavior
  • ARROW-11376 - [C++] ThreadedTaskGroup failure with Thread Sanitizer enabled
  • ARROW-11379 - [C++][Dataset] Better formatting for timestamp scalars
  • ARROW-11387 - [Rust] fix build for conditional compilation of features 'simd + avx512'
  • ARROW-11391 - [C++] Allow writing more than 2 GB to HDFS
  • ARROW-11394 - [Rust] Tests for Slice & Concat
  • ARROW-11400 - [Python] Ensure pickling Dataset with dictionary partitions works
  • ARROW-11403 - [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'
  • ARROW-11412 - [Python][Dataset] Disallow logical operators for Expression
  • ARROW-11412 - [Python] Improve Expression docs
  • ARROW-11427 - [C++] On Windows, only use AVX512 when enabled by the OS
  • ARROW-11448 - [C++] Fix tdigest build failure on Windows with Visual Studio
  • ARROW-11451 - [C++] Fix gcc-4.8 build errors
  • ARROW-11452 - [Rust] Fix issue with Parquet Arrow reader not following type path
  • ARROW-11461 - [Go][Flight] Some cleanup for flight, Fix Schema bytes
  • ARROW-11464 - [Python] Fix parquet.read_pandas to support all keywords of read_table
  • ARROW-11470 - [C++] Detect overflow on computation of tensor strides
  • ARROW-11472 - [Python][CI] Remove temporary pin of numpy in kartothek integration build
  • ARROW-11472 - [Python][CI] Temporary pin numpy on kartothek integration builds
  • ARROW-11480 - [Python] Test filtering on INT96 timestamps
  • ARROW-11483 - [C++] Write integration JSON files compatible with the Java reader
  • ARROW-11488 - [Rust] Don't leak memory in StructBuilder
  • ARROW-11490 - [C++] BM_ArrowBinaryDict/EncodeLowLevel is not deterministic
  • ARROW-11494 - [Rust] fix take bench
  • ARROW-11497 - [Python] Provide parquet enable compliant nested type flag for python binding
  • ARROW-11538 - [Python] Segfault reading Parquet dataset with Timestamp filter
  • ARROW-11547 - [Packaging][Conda][Drone] Fix undefined variable error
  • ARROW-11548 - [C++] Fix RandomArrayGenerator::List
  • ARROW-11551 - [C++][Gandiva] Fix castTimestamp(utf8) function
  • ARROW-11560 - [C++][FlightRPC] fix mutex error on SIGINT
  • ARROW-11567 - [C++][Compute] Improve variance kernel precision
  • ARROW-11577 - [Rust] Fix Array transform on strings
  • ARROW-11582 - [R] write_dataset 'format' argument default and validation could be better
  • ARROW-11586 - [Rust][Datafusion] Remove force unwrap
  • ARROW-11595 - [C++][NIGHTLY:test-conda-cpp-valgrind] Avoid branching on potentially indeterminate values in GenerateBitsUnrolled
  • ARROW-11596 - [Python][Dataset] make ScanTask.execute() eager
  • ARROW-11603 - [Rust] Fix Clippy Lints for Rust 1.50
  • ARROW-11607 - [C++][Parquet] Update values_capacity_ when resetting.
  • ARROW-11614 - Fix round() logic to return positive zero when argument is zero
  • ARROW-11617 - [C++][Gandiva] Fix nested if-else optimisation in gandiva
  • ARROW-11620 - [Rust][DataFusion] Consistently use Arc<dyn TableProvider> rather than Box and Arc
  • ARROW-11630 - [Rust] Introduce limit option for sort kernel
  • ARROW-11632 - [Rust] Make csv::Reader propagate schema metadata to generated RecordBatches
  • ARROW-11639 - [C++][Gandiva] Fix signbit compilation issue in Ubuntu nightly build
  • ARROW-11642 - [C++] Fix preprocessor directive for Windows in JVM detection
  • ARROW-11657 - [R] group_by with .drop specified errors
  • ARROW-11658 - [R] Handle mutate/rename inside group_by
  • ARROW-11663 - [Rust][DataFusion] Fixed error.
  • ARROW-11668 - [C++] Sporadic UBSAN error in FutureStessTest.TryAddCallback
  • ARROW-11672 - [R] Fix string function test failure on R 3.3
  • ARROW-11681 - [Rust] Don't unwrap in IPC writers
  • ARROW-11686 - [C++] Call ArrowLog::InstallFailureSignalHandler to show stack trace
  • ARROW-11687 - [Rust][DataFusion] RepartitionExec Hanging
  • ARROW-11694 - [C++] Fix Take() with no validity bitmap but unknown null count
  • ARROW-11695 - [C++][FlightRPC] fix option to disable TLS verification
  • ARROW-11717 - [Integration] Fix intermittent flight integration failures with rust
  • ARROW-11718 - [Rust] Don't write IPC footers on drop
  • ARROW-11741 - [C++] Fix decimal casts on big endian platforms
  • ARROW-11743 - [R] Use pkgdown's new found ability to autolink Jiras
  • ARROW-11746 - [Developer][Archery] Fix prefer real time check
  • ARROW-11756 - [R] passing a partition as a schema leads to segfaults
  • ARROW-11758 - [C++][Compute] Improve summation kernel percision
  • ARROW-11767 - [C++] Scalar::Hash may segfault
  • ARROW-11771 - [Developer][Archery] Move benchmark tests (so CI runs them)
  • ARROW-11781 - [Python] Reading small amount of files from a partitioned dataset is unexpectedly slow
  • ARROW-11784 - [Rust][DataFusion] CoalesceBatchesStream doesn't honor Stream interface
  • ARROW-11785 - [R] Fallback when filtering Table with unsupported expression fails
  • ARROW-11786 - [C++] Remove noisy CMake message
  • ARROW-11788 - [Java] Fix appending empty delta vectors
  • ARROW-11791 - [Rust][DataFusion] Fix RepartitionExec Blocking
  • ARROW-11802 - [Rust][DataFusion] Remove use of crossbeam channels to avoid potential deadlocks
  • ARROW-11819 - [Rust] Add link to the doc
  • ARROW-11821 - [Rust] Edit Rust README
  • ARROW-11830 - [C++] Don't re-detect gRPC every time
  • ARROW-11832 - [R] Handle conversion of extra nested struct column
  • ARROW-11836 - [C++] Avoid requiring arrow_bundled_dependencies when it doesn't exist for arrow_static.
  • ARROW-11845 - [Rust] Implement to_isize() for ArrowNativeTypes
  • ARROW-11850 - [GLib] Add GARROW_VERSION_0_16
  • ARROW-11855 - [C++][Python] Memory leak in to_pandas when converting chunked struct array
  • ARROW-11857 - [Python] Resource temporarily unavailable when using the new Dataset API with Pandas
  • ARROW-11860 - [Rust][DataFusion] Add DataFusion logos
  • ARROW-11866 - [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
  • ARROW-11872 - [C++] Fix Array validation when Array contains non-CPU buffers
  • ARROW-11880 - [R] Handle empty or NULL transmute() args properly
  • ARROW-11881 - [Rust][DataFusion] Fix clippy lint
  • ARROW-11896 - [Rust] Disable Debug symbols on CI test builds
  • ARROW-11904 - [C++] Try to fix crash on test tear down
  • ARROW-11905 - [C++] Fix SIMD detection on macOS
  • ARROW-11914 - [R][CI] r-sanitizer nightly is broken
  • ARROW-11918 - [R][Documentation] Docs cleanups
  • ARROW-11923 - [CI] Update branch name for dask dev integration tests
  • ARROW-11937 - [C++] Fix GZip codec hanging if flushed twice
  • ARROW-11941 - [Dev] Don't update Jira if run "DEBUG=1 merge_arrow_pr.py"
  • ARROW-11942 - [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads
  • ARROW-11945 - [R] filter doesn't accept negative numbers as valid
  • ARROW-11956 - [C++] Fix system re2 dependency detection for static library
  • ARROW-11965 - [R][Docs] Simplify install.packages command in R dev docs
  • ARROW-11970 - [C++][CI] Fix Valgrind error in arrow-csv-test
  • ARROW-11971 - [Packaging] Vcpkg patch doesn't apply on windows due to line endings
  • ARROW-11975 - [CI][GLib] Remove needless libgccjit
  • ARROW-11976 - [C++] Fix sporadic TSAN error with GatingTask
  • ARROW-11983 - [Python] Avoid ImportError calling from_pandas in threaded code
  • ARROW-11997 - [Python] concat_tables crashes python interpreter
  • ARROW-12003 - [R] Fix NOTE re undefined global function group_by_drop_default
  • ARROW-12006 - [Java] Fix checkstyle config to work on Windows
  • ARROW-12012 - [Java][JDBC] Fix BinaryConsumer reallocation
  • ARROW-12013 - [C++][FlightRPC] Fix bundled gRPC version probing
  • ARROW-12015 - [Rust][DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
  • ARROW-12028 - ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates
  • ARROW-12029 - [R] Remove args from FeatherReader$create v2
  • ARROW-12033 - [Minor][Docs] Fix link in developers/benchmarks.html
  • ARROW-12041 - [C++][Python] Fix type property of tensor and sparse tensor IPC messages
  • ARROW-12051 - [GLib] Keep input stream reference of GArrowCSVReader
  • ARROW-12057 - [Python] Remove direct usage of pandas' Block subclasses (partly)
  • ARROW-12065 - [C++][Python] Fix segfault reading JSON file
  • ARROW-12067 - [Python][Doc] Document pyarrow_(un)wrap_scalar
  • ARROW-12073 - [R] Fix R CMD check NOTE about ‘X_____X’
  • ARROW-12076 - [Rust] Fix build
  • ARROW-12077 - [C++] Fix out-of-bounds write in ListArray::FromArrays
  • ARROW-12086 - [C++] Fix environment variables for bzip2, utf8proc URLs
  • ARROW-12088 - [Python] Fix compiler warning about offsetof
  • ARROW-12089 - [Doc] Fix Sphinx warnings
  • ARROW-12100 - [C++][IPC] Allow null children field when num children is 0
  • ARROW-12103 - [C++] Correctly handle unaligned access in bit-unpacking code
  • ARROW-12112 - [CI] Reduce footprint of conda-integration image
  • ARROW-12112 - [Rust] Create and store less debug information in CI and integration tests
  • ARROW-12113 - [R] Fix rlang deprecation warning from check_select_helpers()
  • ARROW-12130 - [C++] Don't enable Neon if -DARROW_SIMD_LEVEL=NONE
  • ARROW-12138 - [Go][IPC] Update flatbuffers definitions
  • ARROW-12140 - [C++][CI] Fix Valgrind failures in Grouper tests
  • ARROW-12145 - [Developer][Archery] Flaky: test_static_runner_from_json
  • ARROW-12149 - [Dev] Archery benchmark test case is failing
  • ARROW-12154 - [C++][Gandiva] Fix gandiva crash in certain OS/CPU combinations
  • ARROW-12155 - [R] Require Table columns to be same length
  • ARROW-12161 - [C++][Dataset] Revert async CSV reader in datasets
  • ARROW-12161 - [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets
  • ARROW-12169 - [C++] Fix decompressing file with empty stream at the end
  • ARROW-12171 - [Rust] clean up clippy lints
  • ARROW-12172 - [Python][Packaging] Pass python version as setuptools pretend version in the macOS wheel builds
  • ARROW-12178 - [CI] Update setuptools in the ubuntu images
  • ARROW-12186 - [Rust][DataFusion] Fix regexp_match test
  • ARROW-12209 - [JS] Copy all src files into the TypeScript package
  • ARROW-12220 - [C++][CI] Thread sanitizer failure
  • ARROW-12226 - [C++] Fix Address Sanitizer failures
  • ARROW-12227 - [R] Fix RE2 and median nightly build failures
  • ARROW-12235 - [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions
  • ARROW-12241 - [Python] Make CSV cancellation test more robust
  • ARROW-12250 - [Rust][Parquet] Fix failing arrow_writer test
  • ARROW-12254 - [Rust][DataFusion] Stop polling limit input once limit is reached
  • ARROW-12258 - [R] Never do as.data.frame() on collect(as_data_frame = FALSE)
  • ARROW-12262 - [Doc] Enable S3 and Flight in docs build
  • ARROW-12267 - [Rust] Implement support for timestamps in JSON writer
  • ARROW-12273 - [JS][Rust] Remove coveralls
  • ARROW-12279 - [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)
  • ARROW-12294 - [Rust] Fix boolean kleene kernels with no remainder
  • ARROW-12299 - [Python] Recognize new filesytems in pq.write_to_dataset
  • ARROW-12300 - [C++] Remove linking of cuda runtime library
  • ARROW-12313 - [Rust][Ballista] Update benchmark docs for Ballista
  • ARROW-12314 - [Python] Accept columns as set in parquet read_pandas
  • ARROW-12327 - [Dev] Use pull request's head remote when submitting crossbow jobs via the comment bot
  • ARROW-12330 - [Developer] Restore values at counters column of Archery benchmark
  • ARROW-12334 - [Rust][Ballista] Aggregate queries producing incorrect results
  • ARROW-12342 - [Packaging] Fix tabulation in crossbow templates for submitting nightly builds
  • ARROW-12357 - [Archery] Bump Jinja2 version requirement
  • ARROW-12379 - [C++] Fix ThreadSanitizer failure in SerialExecutor
  • ARROW-12382 - [C++] Bundle xsimd if runtime SIMD level is set
  • ARROW-12385 - [R][CI] fix cran picking in CI
  • ARROW-12390 - [Rust] Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice
  • ARROW-12401 - [R] Fix guard around dataset___Scanner__TakeRows
  • ARROW-12405 - [Packaging] Fix apt artifact patterns and artifact uploading from travis
  • ARROW-12408 - [R] Delete Scan()
  • ARROW-12421 - [Rust][DataFusion] Fix topkexec failure
  • ARROW-12421 - [Rust][DataFusion] Disable repartition rule
  • ARROW-12429 - [C++] Fix incorrectly registered test
  • ARROW-12433 - [Rust] Update nightly rust version
  • ARROW-12437 - [Rust][Ballista] Create DataFusion context without repartition
  • ARROW-12440 - [Release][Packaging] Various packaging, release script and release verification script fixes
  • ARROW-12466 - [Python] Avoid AttributeError crash when comparing with None
  • ARROW-12475 - [C++] Fix 'warn_unused_result' warning
  • ARROW-12487 - [C++][Dataset] Fix ScanBatches() hanging
  • ARROW-12495 - [C++] Fix NumPyBuffer::mutable_data()
  • ARROW-12794 - C++/R: read_parquet halts process when accessed multiple times
  • PARQUET-1655 - [C++] Fix comparison of Decimal values in statistics
  • PARQUET-2008 - [C++] Fix information written in RowGroup::total_byte_size

New Features and Improvements

  • ARROW-951 - [JS] Upgrade to typedoc 0.20.19
  • ARROW-2229 - [C++][Python] Add WriteCsv functionality.
  • ARROW-3690 - [Rust] Add Rust to the format integration testing
  • ARROW-6103 - [Release][Java] Remove mvn release plugin
  • ARROW-6248 - [Python][C++] Raise better exception on HDFS file open error
  • ARROW-6455 - [C++] Implement ExtensionType for non-UTF8 Unicode data
  • ARROW-6604 - [C++] Add support for nested types to MakeArrayFromScalar
  • ARROW-7215 - [C++][Gandiva] Implement castVARCHAR(numeric_type) functions
  • ARROW-7364 - [Rust][DataFusion] Add cast options to cast kernel and TRY_CAST to DataFusion
  • ARROW-7633 - [C++][CI] Create fuzz targets for tensors and sparse tensors
  • ARROW-7808 - [Java][Dataset] Implement Dataset Java API by JNI to C++
  • ARROW-7906 - [C++][Python] Add ORC write support
  • ARROW-8049 - [C++] Bump thrift to 0.13 and require cmake 3.10 for it
  • ARROW-8282 - [C++/Python][Dataset] Support schema evolution for integer columns
  • ARROW-8284 - [C++][Dataset] Schema evolution for timestamp columns
  • ARROW-8630 - [C++][Dataset] Pass schema including all materialized fields to catch CSV edge cases
  • ARROW-8631 - [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python
  • ARROW-8658 - [C++][Dataset] Implement subtree pruning for FileSystemDataset
  • ARROW-8672 - [Java] Implement RecordBatch IPC buffer compression from ARROW-300
  • ARROW-8732 - [C++] Add basic cancellation API
  • ARROW-8771 - [C++] Add boost/process library to build support
  • ARROW-8796 - [Rust] Allow parquet to be written directly to memory
  • ARROW-8797 - [C++] Read RecordBatch in a different endian
  • ARROW-8900 - [C++][Python] Expose Proxy Options as parameters for S3FileSystem
  • ARROW-8919 - [C++][Compute][Dataset] Add Function::DispatchBest to accomodate implicit casts
  • ARROW-9128 - [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
  • ARROW-9149 - [C++] Improve configurability of RandomArrayGenerator::ArrayOf
  • ARROW-9196 - [C++][Compute] All casts accept scalar and sliced inputs
  • ARROW-9318 - [C++] Parquet encryption key management
  • ARROW-9731 - [C++][Python][R][Dataset] Implement Scanner::Head
  • ARROW-9749 - [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions
  • ARROW-9777 - [Rust] Implement IPC changes to catch up to 1.0.0 format
  • ARROW-9856 - [R] Add bindings for string compute functions
  • ARROW-10014 - [C++] TaskGroup::Finish should execute tasks
  • ARROW-10089 - [R] inject base class for Array, ChunkedArray and Scalar
  • ARROW-10183 - [C++] Apply composable futures to CSV
  • ARROW-10195 - [C++] Add string struct extract kernel using re2
  • ARROW-10250 - [C++][FlightRPC] Consistently use FlightClientOptions::Defaults
  • ARROW-10255 - [JS] Reorganize exports for ESM tree-shaking
  • ARROW-10297 - [Rust] Parameter for parquet-read to output data in json format, add "cli" feature to parquet crate
  • ARROW-10299 - [Rust] Use IPC Metadata V5 as default
  • ARROW-10305 - [R] Filter with regular expressions
  • ARROW-10306 - [C++] Add string replacement kernel
  • ARROW-10349 - [Python] Build and publish aarch64 wheels
  • ARROW-10354 - [Rust][DataFusion] regexp_extract function to select regex groups from strings
  • ARROW-10360 - [CI] Bump Github Actions cache version
  • ARROW-10372 - [Dataset][C++][Python][R] Support reading compressed CSV
  • ARROW-10406 - [C++] Unify dictionaries when writing IPC file in a single shot
  • ARROW-10420 - [C++] Refactor io and filesystem APIs to take an IOContext
  • ARROW-10421 - [R] Use gc_memory_pool in more places
  • ARROW-10438 - [C++][Dataset] Partitioning::Format on nulls
  • ARROW-10520 - [C++][R] Implement add/remove/replace for RecordBatch
  • ARROW-10570 - [R] Use Converter API to convert SEXP to Array/ChunkedArray
  • ARROW-10580 - [C++] Disallow non-monotonic dense union offsets
  • ARROW-10606 - [C++] Implement Decimal256 casts
  • ARROW-10655 - [C++] Add cache and memoization facility
  • ARROW-10734 - [R] Build and test on Solaris
  • ARROW-10735 - [R] Remove arrow-without-arrow wrapping
  • ARROW-10766 - [Rust][Parquet] Compute nested list definitions
  • ARROW-10816 - [Rust][DataFusion] Initial support for Interval expressions
  • ARROW-10831 - [C++][Compute] Implement quantile kernel
  • ARROW-10846 - [C++] Add async filesystem operations
  • ARROW-10880 - [Java] Support compressing RecordBatch IPC buffers by LZ4
  • ARROW-10882 - [Python] Allow writing dataset from iterator of batches
  • ARROW-10895 - [C++][Gandiva] Implement bool to varchar cast function in Gandiva
  • ARROW-10903 - [Rust] Implement FromIter<Option<Vec<u8>>> constructor for FixedSizeBinaryArray
  • ARROW-11022 - [Rust] Upgrade to Tokio 1.0
  • ARROW-11070 - [C++][Compute] Implement power kernel
  • ARROW-11074 - [Rust][DataFusion] Implement predicate push-down for parquet tables
  • ARROW-11081 - [Java] Make IPC option immutable
  • ARROW-11108 - [Rust] Fixed performance issue in mutableBuffer.
  • ARROW-11141 - [Rust] Add basic Miri checks to CI pipeline
  • ARROW-11149 - [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty
  • ARROW-11150 - [Rust] Add Arrow Rust Community section to Rust README
  • ARROW-11154 - [CI][C++] Move homebrew crossbow tests off of Travis-CI
  • ARROW-11156 - [Rust][DataFusion] Create hashes vectorized in hash join
  • ARROW-11174 - [C++][Dataset] Make expressions available to projection
  • ARROW-11179 - [Format] Make FB comments friendly to rust
  • ARROW-11183 - [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
  • ARROW-11191 - [C++] Use FnOnce for TaskGroup's tasks instead of std::function
  • ARROW-11216 - [Rust] add doc example for StringDictionaryBuilder
  • ARROW-11220 - [Rust] Implement GROUP BY support for Boolean
  • ARROW-11222 - [Rust] Catch up with flatbuffers 0.8.1 which had some UB problems fixed
  • ARROW-11246 - [Rust] Add type to Unexpected accumulator state error
  • ARROW-11254 - [Rust][DataFusion] Add SIMD and snmalloc flags as options to benchmarks
  • ARROW-11260 - [C++][Dataset] Don't require dictionaries when specifying explicit partition schema
  • ARROW-11265 - [Rust] Made bool not ArrowNativeType
  • ARROW-11268 - [Rust][DataFusion] MemTable::load output partition support
  • ARROW-11270 - [Rust] Array slice accessors
  • ARROW-11279 - [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
  • ARROW-11284 - [R] Support dplyr verb transmute()
  • ARROW-11289 - [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns
  • ARROW-11290 - [Rust][DataFusion] Address hash aggregate performance issue with high number of groups
  • ARROW-11291 - [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)
  • ARROW-11300 - [Rust][DataFusion] Further performance improvements on hash aggregation with small groups
  • ARROW-11308 - [Rust][Parquet] Support decimal when writing parquet files
  • ARROW-11309 - [Release][C#] Use .NET 3.1 for verification
  • ARROW-11310 - [Rust] implement JSON writer
  • ARROW-11314 - [Release][APT][Yum] Add support for verifying arm64 packages
  • ARROW-11317 - [Rust] Include the prettyprint feature in CI Coverage
  • ARROW-11318 - [Rust] Support pretty printing timestamp, date, and timestamp types
  • ARROW-11319 - [Rust][DataFusion] Improve test comparisons to record batch, remove test::format_batch
  • ARROW-11321 - [Rust][DataFusion] Fix DataFusion compilation error
  • ARROW-11325 - [Packaging][C#] Release Apache.Arrow.Flight and Apache.Arrow.Flight.AspNetCore
  • ARROW-11329 - [Rust] Don't rerun build.rs on every file change
  • ARROW-11330 - [Rust][DataFusion] add ExpressionVisitor to encode expression walking
  • ARROW-11332 - [Rust] Use MutableBuffer in take_string instead of Vec
  • ARROW-11333 - [Rust] Generalized creation of empty arrays.
  • ARROW-11336 - [C++][Doc] Improve Developing on Windows docs
  • ARROW-11338 - [R] Bindings for quantile and median
  • ARROW-11340 - [C++] Add vcpkg.json manifest to cpp project root
  • ARROW-11343 - [Rust][DataFusion] Simplified example with UDF.
  • ARROW-11346 - [C++][Compute] Implement quantile kernel benchmark
  • ARROW-11349 - [Rust] Add from_iter_values to create arrays from (non null) values
  • ARROW-11350 - [C++] Bump dependency versions
  • ARROW-11354 - [Rust] Speed-up cast of dates and times (2-4x)
  • ARROW-11355 - [Rust] Aligned Date DataType with specification.
  • ARROW-11358 - [Rust] Add benchmark for concatenating small arrays
  • ARROW-11360 - [Rust][DataFusion] Improve CSV "No files found" error message
  • ARROW-11361 - [Rust] Build MutableBuffer/Buffer from iterator of bools
  • ARROW-11362 - [Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance
  • ARROW-11365 - [Rust][Parquet] Logical type printer and parser
  • ARROW-11366 - [Datafusion] Implement constant folding for boolean literal expressions
  • ARROW-11367 - [C++] Implement t-digest approximate quantile utility
  • ARROW-11369 - [DataFusion] Split physical_plan/expressions.rs
  • ARROW-11372 - [Release] Support RC verification on macOS-ARM64
  • ARROW-11373 - [Python][Docs] Add example of specifying type for a column when reading csv file
  • ARROW-11374 - [Python] Make legacy pyarrow.filesystem / pyarrow.serialize warnings more visisble (DeprecationWarning -> FutureWarning)
  • ARROW-11375 - [Rust] Fix deprecation warning in clippy
  • ARROW-11377 - [C++][CI] Add Thread Sanitizer nightly build
  • ARROW-11383 - [Rust] Faster bit AND and OR (2x)
  • ARROW-11386 - [Release] Fix post documents update script
  • ARROW-11389 - [Rust] make comments more consistent and fix typos
  • ARROW-11395 - [DataFusion] Support custom optimizers
  • ARROW-11401 - [Rust][DataFusion] Pass slices instead of Vec in DataFrame API
  • ARROW-11404 - [Rust][DataFusion] Upgrade to aHash 0.7 + minor cleanup
  • ARROW-11405 - [DataFusion] Support multiple custom logical nodes
  • ARROW-11406 - [CI][C++] Fix ccache caching on Travis-CI
  • ARROW-11408 - [Rust] Add window support to datafusion readme
  • ARROW-11411 - [Packaging][Linux] Disable arm64 nightly builds
  • ARROW-11414 - [Rust] Reduce copies in Schema::try_merge
  • ARROW-11417 - [Integration] Add integration tests for buffer compression
  • ARROW-11418 - [Doc] Add buffer compression to IPC support matrix
  • ARROW-11421 - [Rust][DataFusion] Support GROUP BY Date32
  • ARROW-11422 - [C#] add decimal support
  • ARROW-11423 - [R] value_counts and some StructArray methods
  • ARROW-11425 - [C++][Compute] Optimize quantile kernel for integers
  • ARROW-11426 - [Rust][DataFusion] EXTRACT support
  • ARROW-11428 - [Rust] Add power_scalar kernel
  • ARROW-11429 - Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11430 - [Rust] zip kernel: combine arrays based on boolean mask
  • ARROW-11431 - [Rust][DataFusion] Support the HAVING clause.
  • ARROW-11435 - [Datafusion] allow creating ParquetPartition from external crate, make combine_filters public
  • ARROW-11436 - [Rust] Improved from_iter for primitive arrays (-20-30% for cast)
  • ARROW-11437 - [Rust] Removed duplicated code in benches
  • ARROW-11438 - [Rust][DataFusion] Support literal boolean values in DataFusion SQL
  • ARROW-11439 - [Rust] Add year support to temporal kernels
  • ARROW-11440 - [Rust][DataFusion] Add method to CsvExec to get CSV schema
  • ARROW-11442 - [Rust] Expose datetime conversion logic independently
  • ARROW-11443 - [Rust] Write datetime information for Date64 Type in csv writer
  • ARROW-11444 - [Rust][DataFusion] Accept slices as parameters
  • ARROW-11446 - [DataFusion] Added support for scalarValue in Builtin functions.
  • ARROW-11447 - [Rust] Add shift kernel for primitive types
  • ARROW-11449 - [CI][R][Windows] Use ccache
  • ARROW-11457 - [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11459 - [Rust] Added API to build ListArray of Primitives from an iterator
  • ARROW-11462 - [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX
  • ARROW-11463 - [Python] Expose "allow_64bit" to IpcWriteOptions in pyarrow.
  • ARROW-11466 - [Go][Flight] adding Basic Auth handling for go flight client and server
  • ARROW-11467 - [R] Fix reference to json_table_reader() in R docs
  • ARROW-11468 - [R] Allow user to pass schema to read_json_arrow()
  • ARROW-11474 - [C++] Update bundled re2 version
  • ARROW-11476 - [Rust][DataFusion] Test running of TPCH benchmarks in CI
  • ARROW-11477 - [R][Doc] Reorganize and improve README and vignette content
  • ARROW-11478 - [R] Consider ways to make arrow.skip_nul option more user-friendly
  • ARROW-11479 - [Rust][Parquet] Add Method to get compressed size of columns from row group metadata
  • ARROW-11481 - [Rust] More cast implementations
  • ARROW-11484 - [Rust][DataFusion] Derive Clone for ExecutionContext
  • ARROW-11486 - [Website] Use Jekyll 4 and webpack to support Ruby 3.0 or later
  • ARROW-11489 - [Rust][DataFusion] Make DataFrame be Send + Sync
  • ARROW-11491 - [Rust] support JSON schema inference for nested list and struct
  • ARROW-11493 - [CI][Packaging][deb][RPM] Test built packages
  • ARROW-11500 - [R] Allow bundled build script to run on Solaris
  • ARROW-11501 - [C++] endianness check does not work on Solaris
  • ARROW-11504 - [Rust] Added checks to List DataType.
  • ARROW-11505 - [Rust] Add support for LargeUtf8 in csv-writer
  • ARROW-11507 - [R] Bindings for GetRuntimeInfo
  • ARROW-11510 - [Python] Add note that pip >= 19.0 is required to get binary packages
  • ARROW-11511 - [Rust] Replace Arc<ArrayData> by ArrayData in all arrays
  • ARROW-11512 - [Packaging][deb] Add missing gRPC dependency for Ubuntu 21.04
  • ARROW-11513 - [R] Bindings for sub/gsub
  • ARROW-11516 - [R] Allow all C++ compute functions to be called by name in dplyr
  • ARROW-11539 - [Developer][Archery] Change items_per_seconds units
  • ARROW-11541 - [C++][Compute] Implement tdigest kernel
  • ARROW-11542 - [Rust] fix validity bitmap buffer length count in json reader
  • ARROW-11544 - [Rust][DataFusion] Implement as_any for AggregateExpr
  • ARROW-11545 - [Rust][DataFusion] SendableRecordBatchStream should implement Sync
  • ARROW-11556 - [C++] Assorted benchmark-related improvements
  • ARROW-11557 - [Rust][Datafusion] Add deregister_table
  • ARROW-11559 - [C++] Add regression file
  • ARROW-11559 - [C++] Use smarter Flatbuffers verification parameters
  • ARROW-11561 - [Rust][DataFusion] Add Send + Sync to MemTable::load
  • ARROW-11563 - [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))
  • ARROW-11568 - [C++][Compute] Rewrite mode kernel
  • ARROW-11570 - [Rust] ScalarValue - support Date64
  • ARROW-11571 - [CI] Cancel stale Github Actions workflow runs
  • ARROW-11572 - [Rust] Add a kernel for division by single scalar
  • ARROW-11573 - [Developer][Archery] Google benchmark now reports run type
  • ARROW-11574 - [Rust][DataFusion] Upgrade sqlparser to support parsing all TPC-H queries
  • ARROW-11575 - [Developer][Archery] Expose execution time in benchmark results
  • ARROW-11576 - [Rust] Fix unused variable in Rust code example
  • ARROW-11580 - [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11589 - [R] Add methods for modifying Schemas
  • ARROW-11590 - [C++] Move CSV background generator to IO thread pool
  • ARROW-11591 - [C++][Compute] Grouped aggregation
  • ARROW-11592 - [Rust] Fix typo in comment
  • ARROW-11594 - [Rust] Support pretty printing of NullArray
  • ARROW-11597 - [Rust] Split file in smaller ones.
  • ARROW-11598 - [Rust] Split buffer.rs in smaller files
  • ARROW-11599 - [Rust] Add function to create array with all nulls
  • ARROW-11601 - [C++][Python][Dataset] expose Parquet pre-buffer option
  • ARROW-11606 - [Rust][DataFusion] Add input schema to HashAggregateExec
  • ARROW-11610 - [C++] Download boost from sourceforge instead of bintray
  • ARROW-11611 - [C++] Update third party dependency mirrors
  • ARROW-11612 - [C++] Rebuild trimmed boost bundle for 1.75.0
  • ARROW-11613 - [R] Move nightly C++ builds off of bintray
  • ARROW-11616 - [Rust][DataFusion] Add collect_partitioned on DataFrame
  • ARROW-11621 - [CI][Gandiva][Linux] Fix Crossbow setup failure
  • ARROW-11626 - [Rust][DataFusion][DataFusion] examples to own project
  • ARROW-11627 - [Rust] Make allocator be a generic over type T
  • ARROW-11637 - [CI][Conda] Update nightly clean target platforms and packages list
  • ARROW-11641 - [CI] Use docker buildkit's inline cache to reuse build cache across different hosts
  • ARROW-11649 - [R] Add support for null_fallback to R
  • ARROW-11651 - [Rust][DataFusion] Implement Postgres String Functions: Length Functions
  • ARROW-11653 - [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex
  • ARROW-11655 - [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
  • ARROW-11656 - [Rust][DataFusion] Remaining Postgres String functions
  • ARROW-11659 - [R] Preserve group_by .drop argument
  • ARROW-11662 - [C++] Support sorting decimal and fixed size binary data
  • ARROW-11664 - [Rust] cast to LargeUtf8
  • ARROW-11665 - [C++][Python] Improve docstrings for decimal and union types
  • ARROW-11666 - [Integration] Add endianness "gold" integration file for decimal256
  • ARROW-11667 - [Rust] Add documentation for utf8 comparison kernels
  • ARROW-11669 - [Rust][DataFusion] Remove concurrency field from GlobalLimitExec and SortExec
  • ARROW-11671 - [Rust][DataFusion] Clean up Expr doc comments and examples
  • ARROW-11677 - [C++][Docs] Add basic C++ datasets documentation
  • ARROW-11680 - [C++] Add vendored version of folly's spsc queue
  • ARROW-11683 - [R] Support dplyr::mutate()
  • ARROW-11685 - [C++] Fix typo: FutureStessTest -> FutureStressTest
  • ARROW-11688 - [Rust] Casts between Utf8 and LargeUtf8
  • ARROW-11690 - [Rust][DataFusion] Avoid expr copies while using builder methods
  • ARROW-11692 - [Rust][DataFusion] Improve OptimizerRule comments
  • ARROW-11693 - [C++] Add string length kernel
  • ARROW-11700 - [R] Internationalize error handling in tidy eval
  • ARROW-11701 - [R] Implement dplyr::relocate()
  • ARROW-11703 - [R] Implement dplyr::arrange()
  • ARROW-11704 - [R] Wire up dplyr::mutate() for datasets
  • ARROW-11707 - [Rust] support CSV schema inference without file IO
  • ARROW-11708 - [Rust] fix Rust 2021 linting warnings
  • ARROW-11709 - [Rust][DataFusion] Move expressions and inputs into LogicalPlan ratherthan helpers in util
  • ARROW-11710 - [Rust][DataFusion] Implement ExpressionRewriter
  • ARROW-11719 - [Rust][Datafusion] support creating memory table with merged schema
  • ARROW-11721 - [Rust] json schema inference to return Schema instead of SchemaRef
  • ARROW-11722 - [Rust] Improve error message in FFI cast.
  • ARROW-11724 - [C++] Resolve namespace collisions with protobuf 3.15
  • ARROW-11725 - [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow
  • ARROW-11727 - [C++][FlightRPC] Estimate latency quantiles with TDigest
  • ARROW-11730 - [C++] Add implicit convenience constructors for constructing Future from Status/Result
  • ARROW-11733 - [Rust][DataFusion] Implement hash partitioning
  • ARROW-11734 - [C++] vendored safe-math.h does not compile on Solaris
  • ARROW-11735 - [R] Allow Parquet and Arrow Dataset to be optional components
  • ARROW-11736 - [R] Allow string compute functions to be optional
  • ARROW-11737 - [C++] Patch vendored xxhash for Solaris
  • ARROW-11738 - [Rust][DataFusion] Fix Concat and Trim Functions
  • ARROW-11740 - [C++] posix_memalign not declared in scope on Solaris
  • ARROW-11742 - [Rust][DataFusion] Add Expr::is_null and Expr::is_not_nu…
  • ARROW-11744 - [C++] Add xsimd dependency
  • ARROW-11745 - [C++] Add helper to generate random record batches by schema
  • ARROW-11750 - [Python][Dataset] Add support for project expressions
  • ARROW-11752 - [R] Replace usage of testthat::expect_is()
  • ARROW-11753 - [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved
  • ARROW-11754 - [R] Support dplyr::compute()
  • ARROW-11761 - [C++] Increase public API testing
  • ARROW-11766 - [R] Better handling for missing compression codecs on Linux
  • ARROW-11768 - [CI][C++] Make s390x job required
  • ARROW-11773 - [Rust] Support writing well formed JSON arrays as well as newline delimited json streams
  • ARROW-11774 - [R] macos one line install
  • ARROW-11775 - [Rust][DataFusion] Feature Flags for Dependencies
  • ARROW-11777 - [Rust] impl AsRef for StringBuilder/BinaryBuilder
  • ARROW-11778 - [Rust] Cast from LargeUtf8 to Numerical and temporal types
  • ARROW-11779 - [Rust] make alloc module public
  • ARROW-11790 - [Rust][DataFusion][Expr]
  • ARROW-11794 - [Go] Add concurrent-safe ipc.FileReader.RecordAt(i)
  • ARROW-11795 - [MATLAB] Migrate MATLAB Interface for Apache Arrow design doc to Markdown
  • ARROW-11797 - [C++][Dataset] Provide batch stream Scanner methods
  • ARROW-11798 - [Integration] Update testing submodule
  • ARROW-11799 - [Rust] fix len of string and binary arrays created from unbound iterator
  • ARROW-11801 - [C++] Remove bad header guard in filesystem/type_fwd.h
  • ARROW-11803 - [Rust][Parquet] Support v2 LogicalType
  • ARROW-11806 - [Rust][DataFusion] Optimize join / inner join creation of indices
  • ARROW-11820 - [Rust] Added macro to create native types
  • ARROW-11822 - [Rust][Datafusion] Support case sensitive comparisons for functions and aggregates
  • ARROW-11824 - [Rust][Parquet] Use logical types in Arrow schema conversion
  • ARROW-11825 - [Rust][DataFusion] Add mimalloc as option to benchmarks
  • ARROW-11833 - [C++] Bump vendored fast_float
  • ARROW-11837 - [C++][Dataset] expose originating Fragment on ScanTask
  • ARROW-11838 - [C++] Support IPC reads with shared dictionaries.
  • ARROW-11839 - [C++] Use xsimd for generation of accelerated bit-unpacking
  • ARROW-11842 - [Rust][Parquet] Use clone_from in get_batch_with_dict
  • ARROW-11852 - [Docs] Update CONTRIBUTING to explain Contributor role
  • ARROW-11856 - [C++] Remove unused reference to RecordBatchStreamWriter
  • ARROW-11858 - [GLib][Gandiva] Add Gandiva::Filter and related functions
  • ARROW-11859 - [GLib][Ruby] Add garrow_array_concatenate()
  • ARROW-11861 - [R][Packaging] Apply changes in r/tools/autobrew upstream
  • ARROW-11864 - [R] Document arrow.int64_downcast option
  • ARROW-11870 - [Dev] Automatically run merge script in virtual environment
  • ARROW-11876 - [Website] Update governance page
  • ARROW-11877 - [C++] Add microbenchmark for SimplifyWithGuarantee
  • ARROW-11879 - [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan
  • ARROW-11883 - [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map
  • ARROW-11887 - [C++] Add asynchronous read to streaming CSV reader
  • ARROW-11894 - [Rust][DataFusion] Change flight server example to use DataFrame API
  • ARROW-11895 - [Rust][DataFusion] Add support for more column statistics
  • ARROW-11898 - [Rust] Pretty print columns
  • ARROW-11899 - [Java] Refactor the compression codec implementation into core/Arrow specific parts
  • ARROW-11900 - [Website] Add Yibo to committer list
  • ARROW-11906 - [R] : Make FeatherReader print method more informative
  • ARROW-11907 - [C++] Use our own executor in S3FileSystem
  • ARROW-11910 - [Packaging][Ubuntu] Drop support for 16.04
  • ARROW-11911 - [Website] Add protobuf vs arrow to FAQ
  • ARROW-11912 - [R] Remove args from FeatherReader$create
  • ARROW-11913 - [Rust] Improve performance of StringBuilder by delaying bitmap creation
  • ARROW-11920 - [R] Remove r/libarrow when make cleaning
  • ARROW-11921 - [R] Set LC_COLLATE in r/data-raw/codegen.R
  • ARROW-11924 - [C++] Add streaming version of FileSystem::GetFileInfo
  • ARROW-11925 - [R] : Add between method for arrow_dplyr_query
  • ARROW-11927 - [Rust][DataFusion] Support Limit push down optimization
  • ARROW-11931 - [Go] bump to go1.15
  • ARROW-11935 - [C++] Add push generator
  • ARROW-11944 - [Developer] Fix archery's comparison of cached benchmark runs
  • ARROW-11949 - [Ruby] Accept raw Ruby objects as sort key and options
  • ARROW-11951 - [Rust] Remove OffsetSize::prefix
  • ARROW-11952 - [Rust] Make ArrayData --> GenericListArray fallable instead of panic!
  • ARROW-11954 - [C++] arrow/util/io_util.cc does not compile on Solaris
  • ARROW-11955 - [Rust][DataFusion] Support Union
  • ARROW-11958 - [GLib] Add garrow_chunked_array_combine()
  • ARROW-11959 - [Rust][DataFusion] Fix log line
  • ARROW-11962 - [Rust][DataFusion] Improve DataFusion docs
  • ARROW-11969 - [Rust][DataFusion] Improve Examples in documentation
  • ARROW-11972 - [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options
  • ARROW-11973 - [Rust][DataFusion] Boolean kleene kernels
  • ARROW-11977 - [Rust] Add documentation examples for sort kernel
  • ARROW-11982 - [Rust] Donate Ballista Distributed Compute Platform
  • ARROW-11984 - [C++][Gandiva] Implement SHA1 and SHA256 functions
  • ARROW-11987 - [C++][Gandiva] Implement trigonometric functions
  • ARROW-11988 - [C++][Gandiva] Implements last_day function
  • ARROW-11992 - [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType
  • ARROW-11993 - [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
  • ARROW-11996 - [R] Make r/configure run successfully on Solaris
  • ARROW-11999 - [Java] Support parallel vector element search with user-specified comparator
  • ARROW-12000 - [Documentation] Add note about deviation from style guide on struct/classes
  • ARROW-12005 - [R] Fix a bash typo in configure
  • ARROW-12017 - [R][Documentation] Make proper developing arrow docs
  • ARROW-12019 - [Rust][Parquet] Update README for 2.6.0 support
  • ARROW-12020 - [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion
  • ARROW-12031 - [C++][CSV] infer CSV timestamps columns with fractional seconds
  • ARROW-12032 - [Rust] Optimize comparison kernels
  • ARROW-12034 - [Developer Tools] Formalize Minor PRs
  • ARROW-12037 - [Rust][DataFusion] Support catalogs and schemas for table namespacing
  • ARROW-12038 - [Rust][DataFusion] Upgrade hashbrown to 0.11
  • ARROW-12039 - [Nightly][Gandiva] Fix gandiva-jar-ubuntu nightly build failure
  • ARROW-12040 - [C++] Fix potential deadlock in recursive S3 walks
  • ARROW-12043 - [Rust][Parquet] Write FSB arrays
  • ARROW-12045 - [Go][Parquet] Initial Chunk of Parquet port to Go
  • ARROW-12047 - [Rust][Parquet] Cleanup clippy
  • ARROW-12048 - [Rust][DataFusion] Support Common Table Expressions
  • ARROW-12052 - [Rust] Add Child Data to Arrow's C FFI implementation. …
  • ARROW-12056 - [C++] Create sequencing AsyncGenerator
  • ARROW-12058 - [Python] Enable arithmetic operations on Expressions
  • ARROW-12068 - [Python] Stop using distutils
  • ARROW-12069 - [C++][Gandiva] Implement IN expressions for Decimal type
  • ARROW-12070 - [GLib] Drop support for GNU Autotools
  • ARROW-12071 - [GLib] Keep input stream reference of GArrowJSONReader
  • ARROW-12075 - [Rust][DataFusion] Add CTE + UNION ALL to supported list of SQL features
  • ARROW-12081 - [R] Bindings for utf8_length
  • ARROW-12082 - [R][Dataset] Allow create dataset from vector of file paths
  • ARROW-12094 - [C++][R] Fix re2 building on clang/libc++
  • ARROW-12097 - [C++] Modify BackgroundGenerator so it creates fewer threads
  • ARROW-12098 - [R] Catch cpp build failures on linux
  • ARROW-12104 - [Go][Parquet] Second chunk of Ported Go Parquet code
  • ARROW-12106 - [Rust][DataFusion] Support SELECT * from information_schema.tables
  • ARROW-12107 - [Rust][DataFusion] Support SELECT * from information_schema.columns
  • ARROW-12108 - [Rust][DataFusion] Implement SHOW TABLES
  • ARROW-12109 - [Rust][DataFusion] Implement SHOW COLUMNS
  • ARROW-12110 - [Java] Implement ZSTD compression
  • ARROW-12111 - [Java] Generate flatbuffer files using flatc 1.12.0
  • ARROW-12116 - [Rust] Fix and ignore 1.51 clippy lints
  • ARROW-12119 - [Rust][DataFusion] Improve performance of to_array_of_size for primitives
  • ARROW-12120 - [Rust] Generate random arrays and batches
  • ARROW-12121 - [Rust][Parquet] Arrow writer benchmarks
  • ARROW-12123 - [Rust][DataFusion] Use smallvec for indices for better join performance
  • ARROW-12128 - [CI][Crossbow] Remove test-ubuntu-16.04-cpp job
  • ARROW-12131 - [CI][GLib] Ensure upgrading MSYS2
  • ARROW-12133 - [C++][Gandiva] Add option to disable targeting host cpu during llvm ir compilation
  • ARROW-12134 - [C++] Add match_substring_regex kernel
  • ARROW-12136 - [Rust][DataFusion] Reduce default batch_size to 8192
  • ARROW-12139 - [Python][Packaging] Use vcpkg to build macOS wheels
  • ARROW-12141 - [R] Bindings for grepl
  • ARROW-12143 - [CI] R builds should timeout and fail after some threshold and dump the output.
  • ARROW-12146 - [C++][Gandiva] Implement CONVERT_FROM(expression, replacement char) function
  • ARROW-12151 - [Docs] Add Jira component + summary conventions to the docs
  • ARROW-12153 - [Rust][Parquet] Return file stats after writing file
  • ARROW-12160 - [Rust] Add into_inner() to StreamWriter
  • ARROW-12164 - [Java] Make BaseAllocator.Config public
  • ARROW-12165 - [Rust] inline append functions of builders
  • ARROW-12168 - [Go][IPC] Implement Compression handling for Arrow IPC
  • ARROW-12170 - [Rust][DataFusion] Introduce repartition optimization
  • ARROW-12173 - [GLib] Remove #include <config.h>
  • ARROW-12176 - [C++] Fix some typos of cpp examples
  • ARROW-12187 - [C++][FlightRPC] Add compression benchmark for stream writing
  • ARROW-12188 - [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
  • ARROW-12190 - [Rust][DataFusion] Implement parallel / partitioned hash join
  • ARROW-12192 - [Website] Use downloadable URL for archive download
  • ARROW-12193 - [Dev][Release] Use downloadable URL for archive download
  • ARROW-12194 - [Rust][Parquet] Bump zstd to v0.7
  • ARROW-12197 - [R] dplyr bindings for cast, dictionary_encode
  • ARROW-12200 - [R] Export and document list_compute_functions
  • ARROW-12204 - [Rust][CI] Reduce size of Rust build artifacts in integration test
  • ARROW-12206 - [Python][Docs] Fix Table docstrings
  • ARROW-12208 - [C++] Add the ability to run async tasks without using the CPU thread pool
  • ARROW-12210 - [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / Information Schema
  • ARROW-12214 - [Rust][DataFusion] Add tests for limit
  • ARROW-12215 - [C++] Allow null values in fixed-size binary columns read from CSV
  • ARROW-12217 - [C++] Cleanup cpp examples source files naming
  • ARROW-12222 - [Dev][Packaging] Include build url in the crossbow console report
  • ARROW-12224 - [Rust] Use stable rust for no default test, clean up CI tests
  • ARROW-12228 - [CI] Create base image for conda environments
  • ARROW-12236 - [R][CI] Add check that all docs pages are listed in _pkgdown.yml
  • ARROW-12237 - [Packaging][Debian] Add support for bullseye
  • ARROW-12238 - [JS] Remove trailing spaces and consistently add space after //
  • ARROW-12239 - [JS] Switch to yarn
  • ARROW-12242 - [Python][Doc] Tweak nightly build instructions
  • ARROW-12246 - [CI] Sync conda recipes with upstream feedstock
  • ARROW-12248 - [C++] Avoid looking up ARROW_DEFAULT_MEMORY_POOL environment variable too late
  • ARROW-12249 - [R][CI] Fix test-r-install-local nightlies
  • ARROW-12251 - [Rust] Add Ballista to CI
  • ARROW-12263 - [Dev][Packaging] Move Crossbow to Archery
  • ARROW-12269 - [JS] Move to eslint
  • ARROW-12274 - [JS] Document how to run tests without building bundles
  • ARROW-12277 - [Rust][DataFusion] Implement Sum/Count/Min/Max aggregates for Timestamp(,)
  • ARROW-12278 - [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type
  • ARROW-12280 - [Developer] Remove @-mentions from commit messages in merge tool
  • ARROW-12281 - [JS] Remove shx, trash, and rimraf and update learna for yarn
  • ARROW-12283 - [R] Bindings for basic type convert functions in dplyr verbs
  • ARROW-12286 - [C++] Create AsyncGenerator from Future<AsyncGenerator<T>>
  • ARROW-12287 - [C++] Create enumerating generator
  • ARROW-12288 - [C++] Create Scanner interface
  • ARROW-12289 - [C++] Create basic AsyncScanner implementation
  • ARROW-12303 - [JS] Use iterator instead of yield
  • ARROW-12304 - [R] Update news and polish docs for 4.0
  • ARROW-12305 - [JS] Update generate.py to python3 and new versions of pyarrow
  • ARROW-12309 - [JS] Make es2015 bundles the default
  • ARROW-12316 - [C++] Prefer mimalloc on Apple
  • ARROW-12317 - [Rust] JSON writer support for time, duration and date
  • ARROW-12320 - [CI] REPO arg missing from conda-cpp-valgrind
  • ARROW-12323 - [C++][Gandiva] Implement castTIME(timestamp) function
  • ARROW-12325 - [C++][CI] Nightly gandiva build failing due to failure of compiler to move return value
  • ARROW-12326 - [C++] Avoid needless c-ares detection
  • ARROW-12328 - [Rust][Ballista] Fix formatting
  • ARROW-12329 - [Rust][Ballista] Add Ballista README
  • ARROW-12332 - [Rust][Ballista] Add simple api server in scheduler
  • ARROW-12333 - [JS] Remove jest-environment-node-debug and do not emit from typescript by default
  • ARROW-12335 - [Rust][Ballista] Use latest DataFusion
  • ARROW-12337 - [Rust] add DoubleEndedIterator and ExactSizeIterator traits
  • ARROW-12351 - [CI][Ruby] Use ruby/setup-ruby instead of actions/setup-ruby
  • ARROW-12352 - [CI][R][Windows] Remove needless workaround for MSYS2
  • ARROW-12353 - [Packaging][deb] Rename -archive-keyring to -apt-source
  • ARROW-12354 - [Packaging][RPM] Use apache.jfrog.io/artifactory/ instead of apache.bintray.com/
  • ARROW-12356 - [Website] Update install page instructions to point to artifactory
  • ARROW-12361 - [Rust][DataFusion] Allow users to override physical optimization rules
  • ARROW-12367 - [C++] Stop producing when PushGenerator was destroyed
  • ARROW-12370 - [R] Bindings for power kernel
  • ARROW-12374 - [CI][C++][cron] Use Ubuntu 20.04 instead of 16.04
  • ARROW-12375 - [Release] Remove rebase post-release scripts
  • ARROW-12376 - [Dev] Log traceback for unexpected exceptions in archery trigger-bot
  • ARROW-12380 - [Rust][Ballista] Basic scheduler ui
  • ARROW-12381 - [Packaging][Python] macOS wheels are built with wrong package kind
  • ARROW-12383 - [JS] Upgrade dependencies
  • ARROW-12384 - [JS] Use let/const and clean up eslint rules
  • ARROW-12389 - [R][Docs] Add note about autocasting
  • ARROW-12395 - Create RunInSerialExecutor benchmark
  • ARROW-12396 - [Python][Docs] Clarify serialization/filesystem docstrings about deprecated status
  • ARROW-12397 - [Rust][DataFusion] Simplify readme example
  • ARROW-12398 - [Rust] remove redundant bound check in iterators
  • ARROW-12400 - [Rust] Re-enable tests in arrow::array::transform
  • ARROW-12402 - [Rust][DataFusion] Implement SQL metrics example
  • ARROW-12406 - [R] Fix checkbashism violation in configure
  • ARROW-12409 - [R] Remove LazyData from DESCRIPTION
  • ARROW-12419 - [Java] Remove to download flatc binary for s390x
  • ARROW-12420 - [C++/Dataset] Reading null columns as dictionary not longer possible
  • ARROW-12423 - [Docs] Remove Codecov badge
  • ARROW-12425 - [Rust] Fix new_null_array dictionary creation
  • ARROW-12432 - [Rust][DataFusion] Add metrics to SortExec
  • ARROW-12436 - [Rust][Ballista] Add watch capabilities to config backend trait
  • ARROW-12467 - [C++][Gandiva] Add support for LLVM12
  • ARROW-12477 - [Release] Download aarch64 miniforge
  • ARROW-12485 - [C++] Use mimalloc as the default memory allocator on macOS
  • ARROW-12488 - [GLib] Use g_memdup2() with GLib 2.68 or later
  • ARROW-12494 - [C++] ORC adapter fails to compile on GCC 4.8
  • ARROW-12506 - [Python] Improve modularity of pyarrow codebase to speedup compile time
  • ARROW-12652 - disable conda arm64 in nightly
  • PARQUET-1846 - [C++] Remove deprecated IO classes
  • PARQUET-1899 - [C++] Deprecated ReadBatchSpaced
  • PARQUET-1990 - [C++] Refuse to write ConvertedType::NA
  • PARQUET-1993 - [C++] expose way to wait for I/O to complete
kszucs
published 3.0.0 •

Changelog

Source

Apache Arrow 3.0.0 (2021-01-25)

New Features and Improvements

  • ARROW-1846 - [C++][Compute] Implement "any" reduction kernel for boolean data
  • ARROW-4193 - [Rust] Add support for decimal data type
  • ARROW-4544 - [Rust] JSON nested struct reader
  • ARROW-4804 - [Rust] Parse Date32 and Date64 in CSV reader
  • ARROW-4960 - [R] Build r-arrow conda package in crossbow
  • ARROW-4970 - [C++][Parquet] Implement parquet::FileMetaData::Equals
  • ARROW-5336 - [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries
  • ARROW-5350 - [Rust] Allow filtering on simple lists
  • ARROW-5394 - [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types
  • ARROW-5679 - [Python][CI] Remove Python 3.5 support
  • ARROW-5950 - [Rust][DataFusion] Add logger
  • ARROW-6071 - [C++] Generic binary-to-binary casts
  • ARROW-6697 - [Rust] [DataFusion] Validate that all parquet partitions have the same schema
  • ARROW-6715 - [Website] Describe "non-free" component is needed for Plasma packages in install page
  • ARROW-6883 - [C++][Python] Allow writing dictionary deltas
  • ARROW-6995 - [Packaging][Crossbow] The windows conda artifacts are not uploaded to GitHub releases
  • ARROW-7531 - [C++] Reduce header inclusion cost slightly
  • ARROW-7800 - [Python] implement iter_batches() method for ParquetFile and ParquetReader
  • ARROW-7842 - [Rust][Parquet] Arrow list reader
  • ARROW-8113 - [C++] Lighter weight variant<>
  • ARROW-8199 - [C++] Add support for multi-column sort indices on Table
  • ARROW-8289 - [Rust] Parquet Arrow writer with nested support
  • ARROW-8423 - [Rust][Parquet] Serialize Arrow schema metadata
  • ARROW-8425 - [Rust][Parquet] Correct temporal IO
  • ARROW-8426 - [Rust][Parquet] - Add more support for converting Dicts
  • ARROW-8426 - [Rust][Parquet] Add support for writing dictionary types
  • ARROW-8853 - [Rust][Integration Testing] Enable Flight tests
  • ARROW-8876 - [C++] Implement casts from date types to Timestamp
  • ARROW-8883 - [Rust][Integration] Enable more tests
  • ARROW-9001 - [R] Box outputs as correct type in call_function
  • ARROW-9164 - [C++] Add embedded documentation to compute functions
  • ARROW-9187 - [R] Add bindings for arithmetic kernels
  • ARROW-9296 - [Rust][DataFusion] Address clippy errors clippy::unnecessary_unwrap, clippy::useless_format,
  • ARROW-9304 - [C++] Add "AppendEmpty" builder APIs for use inside StructBuilder::AppendNull
  • ARROW-9361 - [Rust] Move array types into their own modules
  • ARROW-9367 - [Python] Sorting on pyarrow data structures ?
  • ARROW-9400 - [Python] Do not depend on conda-forge static libraries in Windows wheel builds
  • ARROW-9475 - [Java] Clean up usages of BaseAllocator, use BufferAllocator in…
  • ARROW-9489 - [C++][string][string] )
  • ARROW-9555 - [Rust][DataFusion] Implement physical node for inner join
  • ARROW-9564 - [Packaging] Vendor r-arrow-feedstock conda-forge recipe
  • ARROW-9674 - [Rust] Make the parquet read and writers Send
  • ARROW-9704 - [Java] TestEndianness.testLittleEndian supports little- and big-endian platforms
  • ARROW-9707 - [Rust] [DataFusion] Re-implement threading model
  • ARROW-9709 - [Java] Test cases in arrow-vector takes care of endianness
  • ARROW-9728 - [Rust][Parquet] Nested definition & repetition for structs
  • ARROW-9747 - [Java][C++] Initial Support for 256-bit Decimals
  • ARROW-9771 - [Rust][DataFusion] treat predicates separated by AND separately in predicate pushdown
  • ARROW-9803 - [Go] Add initial support for s390x
  • ARROW-9804 - [FlightRPC] Flight auth redesign
  • ARROW-9828 - [Rust][DataFusion] Support filter pushdown optimisation for TableProvider implementations
  • ARROW-9861 - [Java] Support big-endian in DecimalVector
  • ARROW-9862 - [Java] Enable UnsafeDirectLittleEndian on a big-endian platform
  • ARROW-9911 - [Rust][DataFusion] SELECT <expression> with no FROM clause should produce a single row of output
  • ARROW-9945 - [C++][Dataset] Refactor Expression::Assume to return a Result
  • ARROW-9991 - [C++] Split kernels for strings/binary
  • ARROW-10002 - [Rust] Remove trait specialization from arrow crate
  • ARROW-10021 - [C++][Compute] Return top-n modes in mode kernel
  • ARROW-10032 - [Documentation] update C++ windows docs
  • ARROW-10079 - [Rust] Benchmark and improve count bits
  • ARROW-10095 - [Rust] Update rust-parquet-arrow-writer branch's encode_arrow_schema with ipc changes
  • ARROW-10097 - [C++] Persist SetLookupState in between usages of IsIn when filtering dataset batches
  • ARROW-10106 - [FlightRPC][Java] Expose onIsReady() callback
  • ARROW-10108 - [Rust] [Parquet] Fix compiler warning about unused return value
  • ARROW-10109 - [Rust] Add support to the C data interface for primitive types and utf8
  • ARROW-10110 - [Rust] Add support to consume C Data Interface
  • ARROW-10131 - [C++][Dataset][Python] Lazily parse parquet metadata
  • ARROW-10135 - [Rust][Parquet] Refactor file module to help adding sources
  • ARROW-10143 - [C++] Rewrite Array(Range)Equals
  • ARROW-10144 - [Flight] Add support for using the TLS_SNI extension
  • ARROW-10149 - [Rust] Add support to external release of un-owned buffers
  • ARROW-10163 - [Rust][DataFusion] Add DictionaryArray coercion support
  • ARROW-10168 - [Rust][Parquet] Schema roundtrip - use Arrow schema from Parquet metadata when available
  • ARROW-10173 - [Rust][DataFusion] Implement support for direct comparison to scalar values
  • ARROW-10180 - [C++][Doc] Update dependency management docs
  • ARROW-10182 - [C++] Add basic continuation support to Future
  • ARROW-10191 - [Rust][Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes
  • ARROW-10197 - [python][Gandiva] Execute expression on filtered data
  • ARROW-10203 - [Doc] Give guidance on big-endian support in the contributors docs
  • ARROW-10207 - [C++] Allow precomputing output string/list offsets in kernels
  • ARROW-10208 - [C++] Fix split string kernels on sliced input
  • ARROW-10216 - [Rust] Simd implementation for primitive min/max kernels
  • ARROW-10224 - [Python] Add support for Python 3.9 except macOS wheel and Windows wheel
  • ARROW-10225 - [Rust][Parquet] Fix null comparison in roundtrip
  • ARROW-10228 - [Julia] Contribute Julia implementation
  • ARROW-10236 - [Rust] Add can_cast_types to arrow cast kernel, use in DataFusion
  • ARROW-10241 - [C++][Compute] Add variance kernel benchmark
  • ARROW-10249 - [Rust] Support nested dictionaries inside list arrays
  • ARROW-10259 - [Rust] Add custom metadata to Field
  • ARROW-10261 - [Rust][Breaking] Change List datatype to Box<Field>
  • ARROW-10263 - [C++][Compute] Improve variance kernel numerical stability
  • ARROW-10268 - [Rust] Write out non-nested dictionaries in the IPC format
  • ARROW-10269 - [Rust] Update to 2020-11-14 nightly
  • ARROW-10277 - [C++] Support comparing scalars approximately
  • ARROW-10289 - [Rust] Read dictionaries in IPC streams
  • ARROW-10292 - [Rust][DataFusion] Simplify merge
  • ARROW-10295 - [Rust][DataFusion] Replace Rc<RefCell<>> by Box<> in accumulators.
  • ARROW-10300 - [Rust] Improve documentation for TPC-H benchmark
  • ARROW-10301 - [C++][Compute] Implement "all" reduction kernel for boolean data
  • ARROW-10302 - [Python] Don't double-package plasma-store-server
  • ARROW-10304 - [C++][Compute] Optimize variance kernel for integers
  • ARROW-10310 - [C++][Gandiva] Add single argument round() in Gandiva
  • ARROW-10311 - [Release] Update crossbow verification process
  • ARROW-10313 - [C++] Faster UTF8 validation for small strings
  • ARROW-10318 - [C++] Use pimpl idiom in CSV parser
  • ARROW-10319 - [Go][Flight] Add context to flight client auth handler
  • ARROW-10320 - [Rust][DataFusion] Migrated from batch iterators to batch streams.
  • ARROW-10322 - [C++][Dataset] Minimize Expression
  • ARROW-10323 - [Release][wheel] Add missing verification setup step
  • ARROW-10325 - [C++][Compute] Refine aggregate kernel registration
  • ARROW-10328 - [C++] Vendor fast_float number parsing library
  • ARROW-10330 - [Rust][DataFusion] Implement NULLIF() SQL function
  • ARROW-10331 - [Rust][DataFusion] Re-organize DataFusion errors
  • ARROW-10332 - [Rust] Allow CSV reader to iterate from start up to end
  • ARROW-10334 - [Rust][Parquet] NullArray roundtrip
  • ARROW-10336 - [Rust] Added FromIter and ToIter for string arrays
  • ARROW-10337 - [C++] More liberal parsing of ISO8601 timestamps with fractional seconds
  • ARROW-10338 - [Rust] Use const fn for applicable methods
  • ARROW-10340 - [Packaging][deb][RPM] Use Python 3.8 for pygit2
  • ARROW-10356 - [Rust][DataFusion] Add support for is_in
  • ARROW-10363 - [Python] Remove CMake bug workaround in manylinux
  • ARROW-10366 - [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate
  • ARROW-10375 - [Rust] Removed PrimitiveArrayOps
  • ARROW-10378 - [Rust] Update take() kernel with support for LargeList.
  • ARROW-10381 - [Rust] Generalized Ordering for inter-array comparisons
  • ARROW-10382 - [Rust] Fix typos
  • ARROW-10383 - [Doc] fix typos
  • ARROW-10384 - [C++] Fix typos
  • ARROW-10385 - [C++][Gandiva] Add support for LLVM 11
  • ARROW-10389 - [Rust][DataFusion] Make the custom source implementation API more explicit
  • ARROW-10392 - [C++][Gandiva] Avoid string copy while evaluating IN expression
  • ARROW-10396 - [Rust][Parquet] Publically export SliceableCursor and FileSource
  • ARROW-10398 - [Rust][Parquet] Re-Export parquet::record::api::Field
  • ARROW-10400 - [C++] Propagate TLS client peer_identity when using mutual TLS
  • ARROW-10402 - [Rust] Refactor array equality
  • ARROW-10407 - [C++] Add BasicDecimal256 division Support
  • ARROW-10408 - [Java] Bump Avro to 1.10.0
  • ARROW-10410 - [Rust] Some refactorings
  • ARROW-10416 - [R] Support Tables in Flight
  • ARROW-10422 - [Rust] Removed unused trait BinaryArrayBuilder
  • ARROW-10424 - [Rust] Minor simplification to the generic impl PrimitiveArray
  • ARROW-10428 - [FlightRPC][Java] Add support for HTTP cookies
  • ARROW-10445 - [Rust] Added doubleEnded iterator to PrimitiveArrayIter
  • ARROW-10449 - [Rust] Make Dictionary::keys be an array
  • ARROW-10454 - [Rust][Datafusion] support creating ParquetExec from filelist and schema
  • ARROW-10455 - [Rust][CI] Fixed error in caching files
  • ARROW-10458 - [Rust][Datafusion] create_logical_plan should not require mutable reference
  • ARROW-10464 - [Rust][DataFusion] Add utility to convert TPC-H data from tbl to CSV and Parquet
  • ARROW-10466 - [Rust] [Website] Update implementation status page
  • ARROW-10467 - [FlightRPC][Java] Add the ability to pass arbitrary client headers.
  • ARROW-10468 - [C++][Compute] Provide KernelExecutor instead of FunctionExecutor
  • ARROW-10476 - [Rust] Allow string arrays to be built from Option<&str> or Option<String>
  • ARROW-10477 - [Rust] Add iterator support for Binary arrays.
  • ARROW-10478 - [Dev][Release] Correct Java versions to 3.0.0-SNAPSHOT
  • ARROW-10481 - [R] Bindings to add, remove, replace Table columns
  • ARROW-10483 - [C++] Move Executor into a separate header
  • ARROW-10484 - [C++] Make Future<> more generic
  • ARROW-10487 - [FlightRPC][C++] Header-based auth in clients
  • ARROW-10490 - [C++][GLib] Fix range-loop-analysis warnings
  • ARROW-10492 - [Java][JDBC] Allow users to config the mapping between SQL types and Arrow types
  • ARROW-10504 - [C++] Suppress UBSAN pointer-overflow warning in RapidJSON
  • ARROW-10510 - [Rust][DataFusion] Benchmark COUNT(DISTINCT) queries.
  • ARROW-10515 - [Julia][Doc] Update lists of supported languages to include Julia
  • ARROW-10522 - [R] Allow rename Table and RecordBatch columns with names()
  • ARROW-10526 - [FlightRPC][C++] Client cookie middleware
  • ARROW-10530 - [R] Optionally use distro package in linuxlibs.R
  • ARROW-10531 - [Rust][DataFusion] : Add schema and graphviz formatting for LogicalPlans and a PlanVisitor
  • ARROW-10539 - [Packaging][Python] Use GitHub Actions to build wheels for Windows
  • ARROW-10540 - [Rust] Extended filter kernel to all types and improved performance
  • ARROW-10541 - [C++] Add re2 library to core arrow / ARROW_WITH_RE2
  • ARROW-10542 - [C#][Flight] Add beginning on flight code for net core
  • ARROW-10543 - [Developer] Add a note about being patient after gitbox is enabled
  • ARROW-10552 - [Rust] Removed un-used Result
  • ARROW-10559 - [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules
  • ARROW-10561 - [Rust] Simplified Buffer's write and write_bytes and fixed undefined behavior
  • ARROW-10562 - [Rust] Potential UB on unsafe code
  • ARROW-10566 - [C++] Allow validating ArrayData directly
  • ARROW-10567 - [C++] Add multiple perf runs options for higher precision reporting
  • ARROW-10572 - [Rust][DataFusion] Use aHash instead of FnvHashMap
  • ARROW-10574 - [Python][Parquet] Allow collections for 'in' / 'not in' filter (in addition to sets)
  • ARROW-10575 - [Rust] Rename union.rs to be cosistent with other arrays
  • ARROW-10581 - [Doc] IPC dictionary reference to relevant section
  • ARROW-10582 - [Rust][DataFusion] Implement "repartition" operator
  • ARROW-10584 - [Rust][DataFusion] Add SQL support for JOIN ON syntax
  • ARROW-10585 - [Rust][DataFusion] Add join support to DataFrame and LogicalPlan
  • ARROW-10586 - [Rust] [DataFusion] Add join support to query planner
  • ARROW-10589 - [Rust] Implement AVX-512 bit and operation
  • ARROW-10590 - [Rust] Remove Date32(Millisecond) from casts
  • ARROW-10591 - [Rust] Add support for StructArray to MutableArrayData
  • ARROW-10595 - [Rust] Simplify inner loop of min/max kernels for non-null case
  • ARROW-10596 - [Rust] Improve take benchmark
  • ARROW-10598 - [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance
  • ARROW-10604 - [GLib][Ruby] Add support for 256-bit decimal
  • ARROW-10607 - [C++][Parquet] Add parquet support for decimal256.
  • ARROW-10609 - [Rust] Optimize min/max of non null strings
  • ARROW-10628 - [Rust] flag clippy warnings as errors
  • ARROW-10633 - [Rust][DataFusion] Dependency version updates
  • ARROW-10634 - [C#][CI] Change the build version from 2.2 to 3.1 in CI
  • ARROW-10636 - [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet
  • ARROW-10637 - [Rust] Added examples to some boolean kernels.
  • ARROW-10638 - [Rust] Improved tests of boolean kernel.
  • ARROW-10639 - [Rust] Added examples to is_null kernel and simplified signature.
  • ARROW-10644 - [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs
  • ARROW-10646 - [C++][FlightRPC] Disable flaky Flight test on Windows
  • ARROW-10648 - [Java] Prepare Java codebase for source release without requiring any git tags to be created or pushed
  • ARROW-10651 - [C++] Fix alloc-dealloc-mismatch in S3-related factory
  • ARROW-10652 - [C++][Gandiva] Make gandiva cache size configurable
  • ARROW-10653 - [Rust] Update toolchain nightly
  • ARROW-10654 - [Rust] Specialize parsing of floats / bools in CSV Reader
  • ARROW-10660 - [Rust] Implement AVX-512 bit or operation
  • ARROW-10665 - [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike
  • ARROW-10666 - [Rust][DataFusion] Support nested SELECT statements.
  • ARROW-10669 - [C++][Compute] Support scalar arguments to Boolean compute functions
  • ARROW-10672 - [Rust][DataFusion] Made Limit be computed on the stream.
  • ARROW-10673 - [Rust][DataFusion] Made sort not collect on execute.
  • ARROW-10674 - [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests
  • ARROW-10677 - [Rust] Fix CSV Boolean parsing + add tests to demonstrate supported csv parsing
  • ARROW-10679 - [Rust][DataFusion] Implement CASE WHEN physical expression
  • ARROW-10680 - [Rust][DataFusion] Add partial support for TPC-H query 12
  • ARROW-10682 - [Rust] Improve sort kernel performance by enabling inlining of is_valid calls
  • ARROW-10685 - [Rust][DataFusion] Added support for Join on filter-pushdown optimizer.
  • ARROW-10688 - [Rust][DataFusion] Implement CASE WHEN logical plan
  • ARROW-10689 - [Rust][DataFusion] Add SQL support for CASE WHEN
  • ARROW-10693 - [Rust][DataFusion] Add support to left join
  • ARROW-10696 - [C++] Add SetBitRunReader
  • ARROW-10697 - [C++] Add notes about bitmap readers
  • ARROW-10703 - [Rust][DataFusion] Compute build-side of hash join once
  • ARROW-10704 - [Rust][DataFusion] Remove Nested from expression enum
  • ARROW-10708 - [Packaging][deb] Add support for Ubuntu 20.10
  • ARROW-10709 - [C++][Python] Allow PyReadableFile::Read() to call pyobj.read_buffer()
  • ARROW-10712 - [Rust][DataFusion] Add tests to TPC-H benchmarks
  • ARROW-10717 - [Rust][DataFusion] Add support for right join
  • ARROW-10720 - [C++] Add Rescale support for BasicDecimal256
  • ARROW-10721 - [C#][CI] Use .NET 3.1 by default
  • ARROW-10722 - [Rust][DataFusion] Reduce overhead of some data types in aggregations / joins, improve benchmarks
  • ARROW-10723 - [Packaging][deb][RPM] Enable Parquet encription
  • ARROW-10724 - [Dev Tools] Added labeler to PRs that need rebase.
  • ARROW-10725 - [Python][Compute] Expose sort options in Python bindings
  • ARROW-10728 - [Rust][DataFusion] Support USING in SQL
  • ARROW-10729 - [Rust][DataFusion] Add SQL support for JOIN using implicit syntax
  • ARROW-10732 - [Rust][DataFusion] Integrate DFSchema as a step towards supporting qualified column names
  • ARROW-10733 - [R] Improvements to Linux installation troubleshooting
  • ARROW-10740 - [Rust][DataFusion] Remove redundant clones found by clippy
  • ARROW-10741 - [Rust] Apply previously ignored clippy suggestions
  • ARROW-10742 - [Python] Check mask when creating array from numpy
  • ARROW-10745 - [Rust] Directly allocate padding bytes in filter context
  • ARROW-10747 - [Rust] : CSV reader optimization
  • ARROW-10750 - [Rust][DataFusion] Add SQL support for LEFT and RIGHT join
  • ARROW-10752 - [GLib] Add garrow_schema_has_metadata()
  • ARROW-10754 - [GLib] Add support for metadata to GArrowField
  • ARROW-10755 - [Rust][Parquet] Add support for writing boolean type
  • ARROW-10756 - [Rust][DataFusion] Fix reduntant clones
  • ARROW-10759 - [Rust][DataFusion] Implement string to date cast
  • ARROW-10763 - [Rust] Speed up take for primitive / boolean for non-null arrays
  • ARROW-10765 - [Rust] Optimize take string for non-null arrays
  • ARROW-10767 - [Rust] Speed up sum with nulls (non-simd)
  • ARROW-10770 - [Rust] JSON nested list reader
  • ARROW-10772 - [Rust] Speed up take by writing to buffer
  • ARROW-10775 - [Rust][DataFusion] Use ahash in join hashmap
  • ARROW-10776 - [C++] Allow STL iteration over concrete primitive arrays
  • ARROW-10781 - [Rust][DataFusion] add the 'Statistics' interface in data source
  • ARROW-10783 - [Rust][DataFusion] Implement Statistics for Parquet TableProvider
  • ARROW-10785 - [Rust] Optimize take string
  • ARROW-10786 - [Packaging][RPM] Drop support for CentOS 6
  • ARROW-10788 - [C++] Make S3 recursive tree walks parallel
  • ARROW-10789 - [Rust][DataFusion] Make TableProvider dynamically typed
  • ARROW-10790 - [C++] Improve ChunkedArray and Table sort_indices performance
  • ARROW-10792 - [Rust][CI] Modularize builds for faster build and smaller caches
  • ARROW-10795 - [Rust] Optimize specialization for datatypes
  • ARROW-10796 - [C++] Implement optimized RecordBatch sorting
  • ARROW-10800 - [Rust][Parquet] Provide access to the elements of parquet::record::{List, Map}
  • ARROW-10802 - [C++][NullType] in parquet column writer
  • ARROW-10808 - [Rust][DataFusion] Support nested expressions in aggregations.
  • ARROW-10809 - [C++] Use Datum for SortIndices() input
  • ARROW-10812 - [Rust] Make BooleanArray not a PrimitiveArray
  • ARROW-10813 - [Rust][DataFusion] Implement DFSchema
  • ARROW-10814 - [Packaging][deb] Remove support for Debian GNU/Linux Stretch
  • ARROW-10817 - [Rust][DataFusion] Implement TypedString and DATE coercion
  • ARROW-10820 - [Rust][DataFusion] Complete TPC-H Benchmark Queries
  • ARROW-10821 - [Rust][Datafusion] support negative expression
  • ARROW-10822 - [Rust][Datafusion] add simd feature flag to datafusion
  • ARROW-10824 - [Rust] Added partialEq to null array
  • ARROW-10825 - [Rust] Added support for NullArray to MutableArrayData
  • ARROW-10826 - [Rust] Add support for FixedSizeBinaryArray to MutableArrayData
  • ARROW-10827 - [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)
  • ARROW-10828 - [Rust][DataFusion] Address / fix clippy lints
  • ARROW-10829 - [Rust][DataFusion] Implement Into<Schema> for DFSchema
  • ARROW-10832 - [Rust][Arrow] generate src/ipc/gen/* with latest snapshot flatc.
  • ARROW-10836 - [Rust] Extend take kernel to FixedSizeListArray
  • ARROW-10838 - [Rust][CI] Add arrow build targeting wasm32
  • ARROW-10839 - [Rust][Data Fusion] Implement BETWEEN operator
  • ARROW-10843 - [C++] Add support for temporal types in sort family kernels
  • ARROW-10845 - [Python][CI] Build with nightly numpy and pandas artifacts
  • ARROW-10849 - [Python] Handle numpy deprecation warnings for builtin type aliases
  • ARROW-10851 - [C++] Reduce size of generated code for sort kernels
  • ARROW-10857 - [Packaging] Follow PowerTools repository name change on CentOS 8
  • ARROW-10858 - [C++] Add missing Boost dependency with Visual C++
  • ARROW-10861 - [Python] Update minimal NumPy version to 1.16.6
  • ARROW-10864 - [Rust] Use standard ordering for floats
  • ARROW-10865 - [Rust] Easier to use Schema -> DFSchema conversion
  • ARROW-10867 - [C++] Workaround gcc internal compiler error
  • ARROW-10869 - [GLib] Add garrow_*_sort_indices() and related options
  • ARROW-10870 - [Julia][Doc] Include Julia in project documentation
  • ARROW-10871 - [Julia][CI] Setup Julia testing via Github Actions
  • ARROW-10873 - [C++] Apple Silicon is reported as arm64 in CMake
  • ARROW-10874 - [Rust][DataFusion] Add statistics for MemTable, change statistics struct
  • ARROW-10877 - [Rust] [DataFusion] Add benchmark based on kaggle movies
  • ARROW-10878 - [Rust] Simplify extend_from_slice
  • ARROW-10879 - [Packaging][deb] Restore Debian GNU/Linux Buster support
  • ARROW-10881 - [C++] Fix EXC_BAD_ACCESS in PutSpaced
  • ARROW-10885 - [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows
  • ARROW-10887 - [Doc][C++] Document C++ IPC API
  • ARROW-10889 - [Rust][Proposal] Add guidelines about usage of unsafe
  • ARROW-10890 - [Rust] [DataFusion] JOIN support
  • ARROW-10891 - [Rust][DataFusion] Enable / fix clone_on_copy, map_clone, or_fun_call
  • ARROW-10893 - [Rust][DataFusion] More clippy lints
  • ARROW-10896 - [C++][CMake] Rename internal RE2 package name to "re2" from "RE2"
  • ARROW-10900 - [Rust][DataFusion] Resolve TableScan provider eagerly
  • ARROW-10904 - [Python][CI][Packaging] Add support for Python 3.9 macOS wheels
  • ARROW-10905 - [Python] Add support for Python 3.9 windows wheels
  • ARROW-10908 - [Rust][DataFusion] Update relevant tpch-queries with BETWEEN
  • ARROW-10917 - [Doc] Update feature matrix for Rust
  • ARROW-10918 - [Doc][C++] Document supported Parquet features
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10927 - [Rust][Parquet][REVERT]
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10929 - [Rust] Change CI to use Stable Rust
  • ARROW-10933 - [Rust] Update readme files in regard to nightly rust
  • ARROW-10934 - [Python] Skip filesystem tests for in-memory fs for fsspec 0.8.5
  • ARROW-10938 - [Rust] upgrade dependency "flatbuffers" to 0.8
  • ARROW-10940 - [Rust] Extend sort kernel to ListArray
  • ARROW-10941 - [Doc] Document supported Parquet encryption features
  • ARROW-10944 - [Rust] Implement min/max aggregate kernels for BooleanArray
  • ARROW-10946 - [Rust] Simplified bit chunk iterator
  • ARROW-10947 - [Rust][DataFusion] Optimize UTF8 to Date32 Conversion
  • ARROW-10948 - [C++] Always use GTestConfig.cmake
  • ARROW-10949 - [Rust] Removed un-needed clone
  • ARROW-10951 - [Python][CI] Fix nightly pandas builds (pytest monkeypatch issue)
  • ARROW-10952 - [Rust] Add pre-commit hook
  • ARROW-10966 - [C++] Use FnOnce for ThreadPool's tasks instead of std::function
  • ARROW-10968 - [Rust][DataFusion] Don't build hash table for right side of join
  • ARROW-10969 - [Rust][DataFusion] Implement basic String ANSI SQL Functions
  • ARROW-10985 - [Rust] Update unsafe guidelines for adding JIRA references
  • ARROW-10986 - [Rust][DataFusion] Add average stats to TPC-H benchmarks
  • ARROW-10988 - [C++] Require CMake 3.5 or later
  • ARROW-10989 - [Rust] Iterate primitive buffers by slice
  • ARROW-10993 - [CI][macOS] Fix Python 3.9 installation by Homebrew
  • ARROW-10995 - [Rust][DataFusion] Limit ParquetExec concurrency when reading large number of files
  • ARROW-11004 - [FlightRPC][Python] Header-based auth in clients
  • ARROW-11005 - [Rust] Remove indirection from take kernel
  • ARROW-11008 - [Rust][DataFusion] Simplify count accumulator
  • ARROW-11009 - [C++] Allow changing default memory pool with an environment variable
  • ARROW-11010 - [Python] `np.float` deprecation warning in `_pandas_logical_type_map`
  • ARROW-11012 - [Rust][DataFusion] Make write_csv and write_parquet concurrent
  • ARROW-11015 - [CI][Gandiva] Move gandiva nightly build from travis to github action
  • ARROW-11018 - [Rust][DataFusion] Add support for column-level statistics, null count.
  • ARROW-11026 - [Rust] : Run tests without requiring environment variables
  • ARROW-11028 - [Rust] Make a few pattern matches more idiomatic
  • ARROW-11029 - [Rust][DataFusion] Add documentation for code that determines number of rows per operator
  • ARROW-11032 - [C++][FlightRPC] Benchmark unix socket RPC
  • ARROW-11033 - [Rust] Csv writing performance improvements
  • ARROW-11034 - [Rust] remove rustfmt ignore list, fix format
  • ARROW-11035 - [Rust] Improved performance of casting to utf8
  • ARROW-11037 - [Rust] Optimized creation of string array from iterator.
  • ARROW-11038 - [Rust] Removed unused trait and Result.
  • ARROW-11039 - [Rust] Performance improvement for utf-8 to float cast
  • ARROW-11040 - [Rust] Simplified builders
  • ARROW-11042 - [Rust][DataFusion] Increase default batch size
  • ARROW-11043 - [C++] Add "is_nan" kernel
  • ARROW-11046 - [Rust][DataFusion] Support count_distinct in DataFrame API
  • ARROW-11049 - [Python] Expose alternate memory pools
  • ARROW-11052 - [Rust][DataFusion] Implement metrics for HashJoinExec
  • ARROW-11053 - [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches
  • ARROW-11054 - [Rust][DataFusion] Move to sqlparser 0.7.0
  • ARROW-11055 - [Rust][DataFusion] Support date_trunc function
  • ARROW-11058 - [Rust][DataFusion] Implement coalesce batches operator
  • ARROW-11063 - [Rust][Breaking] Validate null counts when building arrays
  • ARROW-11064 - [Rust][DataFusion] Speed up hash join on smaller batches
  • ARROW-11072 - [Rust][Parquet] Support reading decimal from physical int types
  • ARROW-11076 - [Rust][DataFusion] Refactor usage of right indices in hash join
  • ARROW-11079 - [R] Catch up on changelog since 2.0
  • ARROW-11080 - [C++][Dataset] Improvements to implicit casting
  • ARROW-11082 - [Rust] C data interface to largeUTF8
  • ARROW-11086 - [Rust] Extend take implementation to more index types
  • ARROW-11091 - [Rust][DataFusion] Fix new clippy linting errors
  • ARROW-11095 - [Python] access pyarrow.RecordBatch field() and column() by string name
  • ARROW-11096 - [Rust][Large] binary
  • ARROW-11097 - [Rust] Minor simplification of some tests.
  • ARROW-11099 - [Rust] Remove unsafe value_slice and raw_values methods from primitive and boolean arrays
  • ARROW-11100 - [Rust] Speed up numeric to string cast using lexical_core
  • ARROW-11101 - [Rust] rewrite pre-commit hook
  • ARROW-11104 - [GLib] Add append_null/append_nulls to GArrowArrayBuilder and use them
  • ARROW-11105 - [Rust] Migrated MutableBuffer::freeze to From<MutableBuffer> for Buffer
  • ARROW-11109 - [GLib] Add garrow_array_builder_append_empty_value() and values()
  • ARROW-11110 - [Rust][Datafusion] ExecutionContext.table should take immutable reference
  • ARROW-11111 - [GLib] Add GArrowFixedSizeBinaryArrayBuilder
  • ARROW-11121 - [Developer] Use pull_request_target for PR JIRA integration
  • ARROW-11122 - [Rust] Added FFI support for date and time.
  • ARROW-11124 - [Doc] Update status matrix for Decimal256
  • ARROW-11125 - [Rust] Logical equality for list arrays
  • ARROW-11126 - [Rust] Document and test ARROW-10656
  • ARROW-11127 - [C++] ifdef unused cpu_info on non-x86 platforms
  • ARROW-11129 - [Rust][DataFusion] Use tokio for loading parquet
  • ARROW-11130 - [Website][CentOS 8][RHEL 8] Enable all required repositories by default
  • ARROW-11131 - [Rust] Improve performance of boolean_equal
  • ARROW-11136 - [R] Bindings for is.nan
  • ARROW-11137 - [Rust][DataFusion] Clippy needless_range_loop,needless_lifetimes
  • ARROW-11138 - [Rust][DataFusion] Add ltrim, rtrim to built-in functions
  • ARROW-11139 - [GLib] Add support for extension type
  • ARROW-11155 - [C++][Packaging] Move gandiva crossbow jobs off of Travis-CI
  • ARROW-11158 - [Julia] Implement Decimal256 support for Julia
  • ARROW-11159 - [Developer] Consolidate pull request related jobs
  • ARROW-11165 - [Rust][DataFusion] Document Postgres as standard SQL dialect
  • ARROW-11168 - [Rust][Doc] Fix cargo doc warnings
  • ARROW-11169 - [Rust] Add a comment explaining where float total_order algorithm came from
  • ARROW-11175 - [R] Small docs fixes
  • ARROW-11176 - [R] Expose memory pool name and document setting it
  • ARROW-11187 - [Rust][Parquet] Fix Build error by Pin specific parquet-format-rs version
  • ARROW-11188 - [Rust] Support crypto functions from PostgreSQL dialect
  • ARROW-11193 - [Java][Documentation] Add Java ListVector Documentation
  • ARROW-11194 - [Rust] Enable packed_simd for aarch64
  • ARROW-11195 - [Rust] [DataFusion] Built-in table providers should expose relevant fields
  • ARROW-11196 - [GLib] Add support for mock, HDFS and S3 file systems with factory function
  • ARROW-11198 - [Packaging][Python] Ensure setuptools version during build supports markdown
  • ARROW-11200 - [Rust][DataFusion] Physical operators and expressions should have public accessor methods
  • ARROW-11201 - [Rust][DataFusion] create_batch_empty - support more types
  • ARROW-11203 - [Developer][Website] Enable JIRA and pull request integration
  • ARROW-11204 - [C++] Fix build failures with bundled gRPC and Protobuf
  • ARROW-11205 - [GLib][Dataset] Add GADFileFormat and its family
  • ARROW-11209 - [Rust] DF - Better error message on unsupported GROUP BY
  • ARROW-11210 - [CI] Restore workflows that had been blocked by INFRA
  • ARROW-11212 - [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels
  • ARROW-11213 - [Packaging][Python] Dockerize wheel building on windows
  • ARROW-11215 - [CI] Use named volumes by default for caching in docker-compose
  • ARROW-11218 - [R] Make SubTreeFileSystem print method more informative
  • ARROW-11219 - [CI][Ruby][MinGW] Reduce CI time
  • ARROW-11221 - [Rust] DF Implement GROUP BY support for Float32/Float64
  • ARROW-11231 - [Packaging][deb][RPM] Add support for mimalloc
  • ARROW-11234 - [CI][Ruby][macOS] Reduce CI time
  • ARROW-11236 - Bump Jackson to 2.11.4
  • ARROW-11240 - [Packaging][R] Add mimalloc to R packaging
  • ARROW-11242 - [CI] Remove CMake 3.2 job
  • ARROW-11245 - [C++][Gandiva] Add support for LLVM 11.1
  • ARROW-11247 - [C++] Infer date32 columns in CSV
  • ARROW-11256 - [Packaging][Linux] Don't buffer packaging output
  • ARROW-11272 - [Release][wheel] Remove unsupported Python 3.5 and manylinux1
  • ARROW-11273 - [Release][deb] Remove unsupported Debian GNU/Linux stretch
  • ARROW-11278 - [Release][NodeJS] Don't touch ~/.bash_profile
  • ARROW-11280 - [Release][APT] Fix minimal build example check
  • ARROW-11281 - [C++] Remove needless runtime RapidJSON dependency
  • ARROW-11282 - [Packaging][deb] Add missing libgflags-dev dependency
  • ARROW-11285 - [Release][APT] Add support for Ubuntu Groovy
  • ARROW-11292 - [Release][JS] Use Node.JS LTS
  • ARROW-11293 - [C++] Don't require Boost and gflags with find_package(Arrow)
  • ARROW-11307 - [Release][Ubuntu][20.10] Add workaround for dependency issue
  • ARROW-11454 - [Website] [Rust] 3.0.0 Blog Post
  • PARQUET-1566 - [C++] Indicate if null count, distinct count are present in column statistics

Bug Fixes

  • ARROW-2616 - [Python] Cross-compiling Pyarrow
  • ARROW-6582 - [R] Arrow to R fails with embedded nuls in strings
  • ARROW-7363 - [Python] add combine_chunks method to ChunkedArray
  • ARROW-7909 - [Website] Add how to install on Red Hat Enterprise Linux
  • ARROW-8258 - [Rust] [Parquet] ArrowReader fails on some timestamp types
  • ARROW-9027 - [Python][Testing] Split parquet tests into multiple files + clean-up
  • ARROW-9479 - [JS] Fix Table.from for zero-item serialized tables, Table.empty for schemas containing compound types (List, FixedSizeList, Map)
  • ARROW-9636 - [Python] Update documentation about 'LZO' compression in parquet.write_table
  • ARROW-9690 - [Go] tests failing on s390x
  • ARROW-9776 - [R] read_feather causes segfault in R if file doesn't exist
  • ARROW-9897 - [C++][Gandiva] Added to_date function
  • ARROW-9897 - [C++][Gandiva] Revert - to_date function
  • ARROW-9898 - [C++][Gandiva] Fix linking issue with castINT/FLOAT functions
  • ARROW-9903 - [R] open_dataset freezes opening feather files on Windows
  • ARROW-9963 - [Python] Recognize datetime.timezone.utc as UTC on conversion python->pyarrow
  • ARROW-10039 - [Rust] Do not require memory alignment of buffers
  • ARROW-10042 - [Rust] Fix tests involving ArrayData/Buffer equality
  • ARROW-10080 - [R] Call gc() and try again in MemoryPool
  • ARROW-10122 - [Python] Fix to_pandas conversion with subset of columns and MultiIndex
  • ARROW-10145 - [C++][Dataset] Assert integer overflow in partitioning falls back to string
  • ARROW-10146 - [Python] Fix parquet FileMetadata.to_dict in case statistics is not set
  • ARROW-10174 - [Java] Fix reading/writing dict structs
  • ARROW-10177 - [CI][Gandiva] Nightly gandiva-jar-xenial fails
  • ARROW-10186 - [Rust] Tests fail when following instructions in README
  • ARROW-10247 - [C++][Dataset] Support writing datasets partitioned on dictionary columns
  • ARROW-10264 - [Python] Fix failing hdfs test
  • ARROW-10270 - [R] Fix CSV timestamp_parsers test on R-devel
  • ARROW-10283 - [Python] Define PY_SSIZE_T_CLEAN to deal with Python deprecation warning
  • ARROW-10293 - [Rust][DataFusion] Fixed benchmarks
  • ARROW-10294 - [Java] Resolve problems of DecimalVector APIs on ArrowBufs
  • ARROW-10298 - [Rust] Incorrect offset handling in iterator over dictionary keys
  • ARROW-10321 - [C++] Use check_cxx_source_compiles for AVX512 detect in compiler
  • ARROW-10333 - [Java] Get rid of org.apache.arrow.util in vector
  • ARROW-10345 - [C++][Compute] Fix NaN handling in sorting and topn kernels
  • ARROW-10346 - [Python] Ensure tests aren't affected by user-supplied AWS config
  • ARROW-10348 - [C++] Fix crash on invalid Parquet data
  • ARROW-10350 - [Rust] Fixes to publication metadata in Cargo.toml
  • ARROW-10353 - [C++] Fix handling of compression in Parquet data pages v2
  • ARROW-10358 - [R] Followups to 2.0.0 release
  • ARROW-10365 - [R] Remove duplicate setting of S3 flag on macOS
  • ARROW-10369 - [Dev] Fix archery release utility test cases
  • ARROW-10371 - [R] Linux system requirements check needs to support older cmake versions
  • ARROW-10386 - [R] List column class attributes not preserved in roundtrip
  • ARROW-10388 - [Java] Fix Spark integration build failure
  • ARROW-10390 - [Rust][Parquet] Ensure it is possible to create custom parquet writers
  • ARROW-10393 - [Rust] Apply fix for null reading in json reader for nested
  • ARROW-10394 - [Rust][Large] BinaryArray creation
  • ARROW-10397 - [C++] Update comment to match change made in b1a7a73ff2
  • ARROW-10399 - [R] Fix performance regression from cpp11::r_string
  • ARROW-10411 - [C++] Fix incorrect child array lengths for Concatenate of FixedSizeList
  • ARROW-10412 - [C++] Improve grpc_cpp_plugin detection
  • ARROW-10413 - [Rust][Parquet] Unignore some tests that are passing now
  • ARROW-10414 - [R] open_dataset doesn't work with absolute/expanded paths on Windows
  • ARROW-10426 - [C++] Allow writing large strings to Parquet
  • ARROW-10433 - [Python] Swopped the conditions for checking for fsspec filesystems
  • ARROW-10434 - [Rust] Fix debug formatting for arrays with lengths between 10 and 20.
  • ARROW-10441 - [Java] Prevent closure of shared channels for FlightClient
  • ARROW-10446 - [C++][Python] Roundtrip Timestamp ns with TzInfo correctly
  • ARROW-10448 - [Rust] Remove PrimitiveArray::new that can cause UB
  • ARROW-10453 - [Rust] [DataFusion] Performance degredation after removing specialization
  • ARROW-10461 - [Rust] Fix offset bug in remainder bits
  • ARROW-10462 - [Python] Fix usage of fsspec in ParquetDataset causing path issue on Windows
  • ARROW-10463 - [R] Better messaging for currently unsupported CSV options in open_dataset
  • ARROW-10470 - [R] Fix missing file error causing NYC taxi example to fail
  • ARROW-10471 - [CI][Python] Ensure we have tests with s3fs and run those on CI
  • ARROW-10472 - [Python] Test to confirm casting timestamp scalars to date type works
  • ARROW-10475 - [C++][FlightRPC] handle IPv6 hosts
  • ARROW-10480 - [Python] don't infer compression by extension for Parquet
  • ARROW-10482 - [Python] Fix compression per column in Parquet writing
  • ARROW-10491 - [FlightRPC][Java] Fix NPE when using makeContext
  • ARROW-10493 - [C++][Parquet] Fix offset lost in MaybeReplaceValidity
  • ARROW-10495 - [Packaging][deb] Move FindRE2.cmake to libarrow-dev
  • ARROW-10496 - [R][CI] Fix conda-r job
  • ARROW-10499 - [C++][Java] Fix ORC Java JNI Crash
  • ARROW-10502 - [C++/Python] CUDA detection messes up nightly conda-win builds
  • ARROW-10503 - [C++] Uriparser will not compile using Intel compiler
  • ARROW-10508 - [Java] Allow FixedSizeListVector to have empty children
  • ARROW-10509 - [C++] Define operator<<(ostream, ParquetException) for clang+Windows
  • ARROW-10511 - [Python] Fix to_pandas() conversion in case of metadata mismatch about timezone
  • ARROW-10518 - [C++][Gandiva] Adding NativeFunction::kCanReturnErrors to cast function in gandiva
  • ARROW-10519 - [Python] Fix deadlock when importing pandas from several threads
  • ARROW-10525 - [C++] Fix crash on unsupported IPC stream
  • ARROW-10532 - [Python] Fix metadata in Table.from_pandas conversion with specified schema with different column order
  • ARROW-10545 - [C++] Fix crash on invalid Parquet file (OSS-Fuzz)
  • ARROW-10546 - [Python] Deprecate DaskFileSystem/S3FSWrapper + stop using it internally
  • ARROW-10547 - [Rust][DataFusion] Do not lose Filters with UserDefined plan nodes
  • ARROW-10551 - [Rust] Fix unreproducible benches by seeding random number generator
  • ARROW-10558 - [Python] Fix python S3 filesystem tests interdependence
  • ARROW-10560 - [Python] Fix crash when creating array from huge string
  • ARROW-10563 - [Packaging][deb][RPM] Add missing dev package dependencies
  • ARROW-10565 - [Python] Table.from_batches and Table.from_pandas have argument Schema_schema in documentation instead of schema
  • ARROW-10568 - [C++][Parquet] Avoid crashing when OutputStream::Tell fails
  • ARROW-10569 - [C++] Improve table filtering performance
  • ARROW-10577 - [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests
  • ARROW-10578 - [C++] Comparison kernels crashing for string array with null string scalar
  • ARROW-10610 - [C++] Updated vendored fast_float version to latest
  • ARROW-10616 - [Developer] Expand PR labeler to all supported languages
  • ARROW-10617 - [Python] Fix RecordBatchStreamReader iteration with Python 3.8
  • ARROW-10619 - [C++] Fix IPC validation regressions
  • ARROW-10620 - [Rust][Parquet] move column chunk range logic to metadata.rs
  • ARROW-10621 - [Java] Put required libraries into the common directory
  • ARROW-10622 - [R] Nameof should not use "void" as the crib
  • ARROW-10623 - [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0
  • ARROW-10624 - [R] Proactively remove "problems" attributes
  • ARROW-10627 - [Rust] Loosen cfg restrictions for wasm32
  • ARROW-10629 - [CI] Fix MinGW Github Actions jobs
  • ARROW-10631 - [Rust] Fixed error in computing equality of fixed-sized binary.
  • ARROW-10642 - [R] Can't get Table from RecordBatchReader with 0 batches
  • ARROW-10656 - [Rust] Allow schema validation to ignore field names and only check data types on new batch
  • ARROW-10656 - [Rust] Use DataType comparison without values
  • ARROW-10661 - [C#] Fix benchmarking project
  • ARROW-10662 - [Java] Avoid integer overflow for Json file reader
  • ARROW-10663 - [C++] Fix is_in and index_in behaviour
  • ARROW-10667 - [Rust][Parquet] Add a convenience type for writing Parquet to memory
  • ARROW-10668 - [R] Support for the .data pronoun
  • ARROW-10681 - [Rust] [DataFusion] TPC-H Query 12 fails with scheduler error
  • ARROW-10684 - [Rust] Inherit struct nulls in child null equality
  • ARROW-10690 - [Java] Fix ComplexCopier bug for list vector
  • ARROW-10692 - [Rust] Removed undefined behavior derived from null pointers
  • ARROW-10694 - [Python] ds.write_dataset() generates empty files for each final partition
  • ARROW-10699 - [C++] Fix BitmapUInt64Reader on big endian
  • ARROW-10701 - [Rust] Fix sort_limit_query_sql benchmark
  • ARROW-10705 - [Rust] Loosen restrictions on some lifetime annotations
  • ARROW-10710 - [Rust] Revert tokio upgrade, go back to 0.2
  • ARROW-10711 - [CI] Remove set-env from auto-tune to work with new GHA settings
  • ARROW-10719 - [C#] ArrowStreamWriter doesn't write schema metadata
  • ARROW-10746 - [C++] Bump gtest version + use GTEST_SKIP in tests
  • ARROW-10748 - [Java][JDBC] Support consuming timestamp data when time zone is not available
  • ARROW-10749 - [C++] Incorrect string format for Datum with the collection type
  • ARROW-10751 - [C++] Add RE2 to minimal build example
  • ARROW-10753 - [Rust][DataFusion] Fix parsing of negative numbers in DataFusion
  • ARROW-10757 - [Rust][CI] Fix CI failures
  • ARROW-10760 - [Rust][DataFusion] Fixed error in filter push down over joins
  • ARROW-10769 - [Rust][Rust] Use DataType comparison without values"
  • ARROW-10774 - [R] Set minimum cpp11 version
  • ARROW-10777 - [Packaging][Python] Build sdist by Crossbow
  • ARROW-10778 - [Python] Fix RowGroupInfo.statistics for empty row groups
  • ARROW-10779 - [Java] Fix writeNull method in UnionListWriter
  • ARROW-10780 - [R] Update known R installation issues for CentOS 7
  • ARROW-10791 - [Rust] StreamReader, read_dictionary duplicating schema info
  • ARROW-10801 - [Rust][Flight] Support sending FlightData for Dictionaries with that of a RecordBatch
  • ARROW-10803 - Support R >= 3.3 and add CI
  • ARROW-10804 - [Rust] Removed some unsafe code from the parquet crate
  • ARROW-10807 - [Rust][DataFusion] Avoid double hashing
  • ARROW-10810 - [Rust] Improve comparison kernels performance
  • ARROW-10811 - [R][CI] Remove nightly centos6 build
  • ARROW-10823 - [Rust] Fixed error in MutableArrayData
  • ARROW-10830 - [Rust] avoid hard crash in json reader
  • ARROW-10833 - [Python] Allow pyarrow to be compiled on NumPy <1.16.6 and work on 1.20+
  • ARROW-10834 - [R] Fix print method for SubTreeFileSystem
  • ARROW-10837 - [Rust][DataFusion] Use Vec<u8> for hash keys
  • ARROW-10840 - [C++] FileMetaData does not have key_value_metadata when built from FileMetaDataBuilder
  • ARROW-10842 - [Rust] decouple IO from json reader, fix crash during json schema inference with invalid json
  • ARROW-10844 - [Rust][DataFusion] Allow joins after a table registration
  • ARROW-10850 - [R] Unrecognized compression type: LZ4
  • ARROW-10852 - [C++] AssertTablesEqual(verbose=true) segfaults if the le…
  • ARROW-10854 - [Rust][DataFusion] Simplify logical plan scans
  • ARROW-10855 - [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1
  • ARROW-10856 - [R] CC and CXX environment variables passing to cmake
  • ARROW-10859 - [Rust][DataFusion] Made collect not require ExecutionContext
  • ARROW-10860 - [Java] Avoid integer overflow for generated classes in Vector
  • ARROW-10863 - [Python] Fix pandas skip in ExtensionArray.to_pandas test
  • ARROW-10863 - [Python] Fix ExtensionArray.to_pandas to use underlying storage array
  • ARROW-10875 - [Rust] simplify simd cfg check with cfg_aliases
  • ARROW-10876 - [Rust] validate row value type in json reader
  • ARROW-10897 - [Rust] Removed level of indirection.
  • ARROW-10907 - [Rust] Fix Cast UTF8 to Date64
  • ARROW-10913 - [Python][Doc] Code block typo in filesystems docs
  • ARROW-10914 - [Rust] Refactor simd arithmetic kernels to use chunked iteration
  • ARROW-10915 - [Rust] README.md: set the Env vars as absolute dirs; several minor fixes.
  • ARROW-10921 - `TypeError: 'coroutine' object is not iterable` when reading parquet partitions via s3fs >= 0.5 with pyarrow
  • ARROW-10930 - [Python] Add value_field property to LargeListType / FixedSizeListType
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10942 - [C++] Fix S3FileSystem::Impl::IsEmptyDirectory on Amazon
  • ARROW-10943 - [Rust][Parquet] Always init new RleDecoder
  • ARROW-10954 - [C++][Doc] PlasmaClient is threadSafe now
  • ARROW-10955 - [C++] Fix JSON reading of list(null) values
  • ARROW-10960 - [C++][FlightRPC] Default to empty buffer instead of null
  • ARROW-10962 - [FlightRPC][Java] fill in empty body buffer if needed
  • ARROW-10967 - [Rust] Add functions for test data to mod arrow::util::test_util
  • ARROW-10990 - [Rust] Refactor simd comparison kernels to avoid out of bounds reads
  • ARROW-10994 - [Rust][DataFusion] Add support for compression when writing Parquet files
  • ARROW-10996 - [Rust][Parquet] change return value type of get_arrow_schema_from_metadata()
  • ARROW-10999 - [Rust][Benchmarks] Use signed ints for TPC-H schema
  • ARROW-11014 - [Rust][DataFusion] Use correct statistics for ParquetExec
  • ARROW-11023 - [C++][CMake] Fix gRPC build issue
  • ARROW-11024 - [Python] Add test for List<Struct> data Parquet roundtrip
  • ARROW-11025 - [Rust] Fixed bench for binary boolean kernels
  • ARROW-11030 - [Rust][DataFusion] Concatenate left side batches to single batch in HashJoinExec
  • ARROW-11048 - [Rust] Add bench to MutableBuffer
  • ARROW-11050 - [R] Handle RecordBatch in write_parquet()
  • ARROW-11067 - [C++] Fix CSV null detection on large values
  • ARROW-11069 - [C++] Parquet writer incorrect data being written when data type is struct
  • ARROW-11073 - [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs
  • ARROW-11083 - [CI] Ensure using Ubuntu 20.04 for dev.yml:release job
  • ARROW-11084 - [Rust] Fixed clippy
  • ARROW-11085 - [Rust] Migrated from action-rs to shell in github actions.
  • ARROW-11092 - [CI] (Temporarily) move offending workflows to separate files
  • ARROW-11102 - [Rust][DataFusion] fmt::Debug for ScalarValue(Utf8) is always quoted
  • ARROW-11113 - [Rust] support as_struct_array cast
  • ARROW-11114 - [Java] Fix Schema and Field metadata JSON serialization
  • ARROW-11132 - [CI] Use pip to install crossbow's dependencies for the comment bot
  • ARROW-11144 - [CI][C++][Python] Move to newer Hadoop version
  • ARROW-11152 - [CI][C++] Fix Homebrew numpy installation on macOS builds
  • ARROW-11162 - [C++][Parquet] Fix invalid cast on Decimal256 Parquet data
  • ARROW-11163 - [C++] Fix reading of compressed IPC/Feather files written with Arrow 0.17
  • ARROW-11166 - [Python] Add binding for ProjectOptions
  • ARROW-11171 - [Go] Fix building on s390x with noasm
  • ARROW-11189 - [Developer] support benchmark diff between JSONs
  • ARROW-11190 - [C++] Clean up compiler warnings
  • ARROW-11202 - [R][CI] Nightly builds not happening (or artifacts not exported)
  • ARROW-11224 - [R] don't test metadata serialization on old R versions
  • ARROW-11226 - [Python] Skip/workaround failing filesystem test with s3fs 0.5
  • ARROW-11227 - [Python] Fix to_pandas with ExtensionArray tests for pandas 0.24
  • ARROW-11229 - [C++][Dataset] Fix static build failure
  • ARROW-11230 - [R] Fix build failures on Windows when multiple libarrow binaries found
  • ARROW-11232 - [C++] Make Table::CombineChunks() handle table with zero column correctly
  • ARROW-11233 - [C++][Flight] Fix link error with bundled gRPC and Abseil
  • ARROW-11237 - [C++] Restore DCHECK definitions after GLog
  • ARROW-11250 - [Python] Inconsistent behavior calling ds.dataset()
  • ARROW-11251 - [CI] Make sure that devtoolset-8 is really installed + being used
  • ARROW-11253 - [R] : Make sure that large metadata tests are reproducible
  • ARROW-11255 - [Packaging][Conda][macOS] Fix Python version
  • ARROW-11257 - [C++][Parquet] PyArrow Table contains different data after writing and reloading from Parquet
  • ARROW-11271 - [Rust][Parquet] Fix parquet list schema null conversion
  • ARROW-11274 - [Packaging][wheel][Windows] Fix wheels path for Gemfury
  • ARROW-11275 - [Packaging][wheel][Linux] Fix paths for Gemfury
  • ARROW-11283 - [Julia] Update Julia install link for 3.0 release
  • ARROW-11286 - [Release][Yum] Fix minimal build example check
  • ARROW-11287 - [Packaging][RPM] Add missing dependencies
  • ARROW-11301 - [C++] Fix reading Parquet LZ4-compressed files produced by Hadoop
  • ARROW-11302 - [Release][Python] Remove verification of python 3.5 wheel on macOS
  • ARROW-11306 - [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
  • ARROW-11363 - C++ Library Build Failure with gRPC 1.34+
  • ARROW-11390 - [Python] pyarrow 3.0 issues with turbodbc
  • ARROW-11445 - Type conversion failure on numpy 0.1.20
  • ARROW-11450 - [Python] pyarrow<3 incompatible with numpy>=1.20.0
  • ARROW-11487 - [Python] Can't create array from Categorical with numpy 1.20
  • ARROW-11835 - [Python] PyArrow 3.0/Pip installation errors on Big Sur.
  • ARROW-12399 - Unable to load libhdfs
  • PARQUET-1935 - [C++] Fix bug in WriteBatchSpaced
kszucs
published 2.0.0 •

Changelog

Source

Apache Arrow 2.0.0 (2020-10-19)

Bug Fixes

  • ARROW-2367 - [Python] ListArray has trouble with sizes greater than kMaximumCapacity
  • ARROW-4189 - [Rust] Added coverage report.
  • ARROW-4917 - [C++] orc_ep fails in cpp-alpine docker
  • ARROW-5578 - [C++][Flight] Flight does not build out of the box on Alpine Linux
  • ARROW-7226 - [Python][Doc] Add note re: JSON format support
  • ARROW-7384 - [Website] Fix search indexing warning reported by Google
  • ARROW-7517 - [C++] Builder does not honour dictionary type provided during initialization
  • ARROW-7663 - [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table
  • ARROW-7903 - [Rust][DataFusion] Migrated to sqlparser 0.6.1
  • ARROW-7957 - [Python] Handle new FileSystem in ParquetDataset by automatically using new implementation
  • ARROW-8265 - [Rust] [DataFusion] Table API collect() should not require context
  • ARROW-8394 - [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+
  • ARROW-8735 - [Rust][Parquet] Allow arm 32 to use soft hash implementation
  • ARROW-8749 - [C++] IpcFormatWriter writes dictionary batches with wrong ID
  • ARROW-8773 - [Python] Preserve nullability of fields in schema.empty_table()
  • ARROW-9028 - [R] Should be able to convert an empty table
  • ARROW-9096 - [Python] Pandas roundtrip with dtype="object" underlying numeric column index
  • ARROW-9177 - [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
  • ARROW-9414 - [Packaging][deb][RPM] Enable S3
  • ARROW-9462 - [Go] The Indentation after the first Record in arrjson writer is incorrect
  • ARROW-9463 - [Go] Make arrjson Writer close idempotent
  • ARROW-9490 - [Python][C++] Bug in pa.array when input mixes int8 with float
  • ARROW-9495 - [C++] Equality assertions don't handle Inf / -Inf properly
  • ARROW-9520 - [Rust][DataFusion] Add support for aliased aggregate exprs
  • ARROW-9528 - [Python] Honor tzinfo when converting from datetime
  • ARROW-9532 - [Python][Doc] Use Python3_EXECUTABLE instead of PYTHON_EXECUTABLE for finding Python executable
  • ARROW-9535 - [Python] Remove symlink fixes from conda recipe
  • ARROW-9536 - [Java] Miss parameters in PlasmaOutOfMemoryException.java
  • ARROW-9541 - [C++] CMakeLists requires UTF8PROC_STATIC when building static library
  • ARROW-9544 - [R] Fix version argument of write_parquet()
  • ARROW-9546 - [Python] Clean up Pandas Metadata Conversion test
  • ARROW-9548 - [Go] Test output files are not removed correctly
  • ARROW-9549 - [Rust] Fixed version in dependency in parquet.
  • ARROW-9554 - [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result
  • ARROW-9556 - [Python][C++] Segfaults in UnionArray with null values
  • ARROW-9560 - [Packaging] Add required conda-forge.yml
  • ARROW-9569 - [CI][R] Fix rtools35 builds for msys2 key change
  • ARROW-9570 - [Doc] Clean up sphinx sidebar
  • ARROW-9573 - [Python][Dataset] Provide read_table(ignore_prefixes=)
  • ARROW-9574 - [R] Cleanups for CRAN 1.0.0 release
  • ARROW-9575 - [R] gcc-UBSAN failure on CRAN
  • ARROW-9577 - [C++] Ignore EBADF error in posix_madvise()
  • ARROW-9583 - [Rust] Fix offsets in result of arithmetic kernels
  • ARROW-9588 - [C++] Partially support building with clang in an MSVC setting
  • ARROW-9589 - [C++/R] Forward declare structs as structs
  • ARROW-9592 - [CI] Update homebrew before calling brew bundle
  • ARROW-9596 - [CI][Crossbow] Fix homebrew-cpp again, again
  • ARROW-9597 - [C++] AddAlias in compute::FunctionRegistry should be synchronized
  • ARROW-9598 - [C++][Parquet] Fix writing nullable structs
  • ARROW-9599 - [CI] Appveyor toolchain build fails because CMake detects different C and C++ compilers
  • ARROW-9600 - [Rust] pin proc macro
  • ARROW-9600 - [Rust][Arrow] pin older version of proc-macro2 during build
  • ARROW-9602 - [R] Improve cmake detection in Linux build
  • ARROW-9603 - [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs
  • ARROW-9606 - [C++][Dataset] Support "a"_.In(<>).Assume(<compound>)
  • ARROW-9609 - [C++][Dataset] CsvFileFormat reads all virtual columns as null
  • ARROW-9621 - [Python] Skip test_move_file for in-memory fsspec filesystem
  • ARROW-9622 - [Java] Fixed UnsupportedOperationException in complexcopier with null value in unionvector inside st…
  • ARROW-9628 - [Rust] Disable artifact caching for Mac OSX builds
  • ARROW-9629 - [Python] Fix kartothek integration tests by fixing dependencies
  • ARROW-9631 - [Rust] Make arrow not depend on flight
  • ARROW-9631 - [Rust] flight should depend on arrow, not the other way around
  • ARROW-9642 - [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices
  • ARROW-9643 - [C++] Only register the SIMD variants when it's supported.
  • ARROW-9644 - [C++][Dataset] Don't apply ignore_prefixes to partition base_dir
  • ARROW-9652 - [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs
  • ARROW-9653 - [Rust][DataFusion] Do not error in planner with SQL has multiple group by expressions
  • ARROW-9659 - [C++] Fix RecordBatchStreamReader when source is CudaBufferReader
  • ARROW-9660 - [C++] Revamp dictionary association in IPC
  • ARROW-9666 - [Python][wheel][Windows] Fix wheel build for Windows
  • ARROW-9670 - [C++][FlightRPC] don't hang if Close and Read called simultaneously
  • ARROW-9676 - [R] Error converting Table with nested structs
  • ARROW-9684 - [C++] Fix undefined behaviour on invalid IPC / Parquet input
  • ARROW-9692 - [Python] Fix distutils-related warning
  • ARROW-9693 - [CI][Docs] Nightly docs build fails
  • ARROW-9696 - [Rust][DataFusion] fix nested binary expressions
  • ARROW-9698 - [C++] Remove -DNDEBUG flag leak in .pc file
  • ARROW-9700 - [Python] fix create_library_symlinks for macos
  • ARROW-9712 - [Rust][DataFusion] Fix parquet error handling and general code improvements
  • ARROW-9714 - [Rust][DataFusion] Implement type coercion rule for limit and sort
  • ARROW-9716 - [Rust][DataFusion] Implement limit on concurrent threads in MergeExec
  • ARROW-9726 - [Rust][DataFusion] Do not create parquet reader thread until execute is called
  • ARROW-9727 - [C++] Fix crashes on invalid IPC input (OSS-Fuzz)
  • ARROW-9729 - [Java] Disable Error Prone when project is imported into …
  • ARROW-9733 - [Rust][DataFusion] Added support for COUNT/MIN/MAX on string columns
  • ARROW-9734 - [Rust][DataFusion] TableProvider.scan now returns partitions instead of iterators
  • ARROW-9741 - [Rust] [DataFusion] Incorrect count in TPC-H query 1 result set
  • ARROW-9743 - [R] Sanitize paths in open_dataset
  • ARROW-9744 - [Python] Fix build failure on aarch64
  • ARROW-9764 - [CI][Java] Fix wrong image name for push
  • ARROW-9768 - [Python] Check overflow in conversion of datetime objects to nanosecond timestamps
  • ARROW-9768 - [Rust][DataFusion] Rename PhysicalPlannerImpl to DefaultPhysicalPlanner
  • ARROW-9778 - [Rust][DataFusion] Implement Expr.nullable() and make consistent between logical and physical plans
  • ARROW-9783 - [Rust][DataFusion] Remove aggregate expression data type
  • ARROW-9785 - [Python] Fix excessively slow S3 options test
  • ARROW-9789 - [C++] Don't install jemalloc in parallel
  • ARROW-9790 - [Rust][Parquet] : Increase test coverage in arrow_reader.rs
  • ARROW-9790 - [Rust][Parquet] Fix PrimitiveArrayReader boundary conditions
  • ARROW-9793 - [Rust][DataFusion] Fixed unit tests
  • ARROW-9797 - [Rust] AMD64 Conda Integration Tests is failing for the Master branch
  • ARROW-9799 - [Rust] [DataFusion] Implementation of physical binary expression get_type method is incorrect
  • ARROW-9800 - [Rust][Parquet] Remove println! when writing column statistics
  • ARROW-9801 - DictionaryArray with non-unique values are silently corrupted when written to a Parquet file
  • ARROW-9809 - [Rust][DataFusion] Fixed type coercion, supertypes and type checking.
  • ARROW-9814 - [Python] Fix crash in test_parquet::test_read_partitioned_directory_s3fs
  • ARROW-9815 - [Rust][DataFusion] Remove the use of Arc/Mutex to protect plan time structures
  • ARROW-9815 - [Rust][DataFusion] Add a trait for looking up scalar functions by name
  • ARROW-9815 - [Rust][DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.
  • ARROW-9816 - [C++] Escape quotes in config.h
  • ARROW-9827 - [C++][Dataset] Skip parsing RowGroup metadata statistics when there is no filter
  • ARROW-9831 - [Rust][DataFusion] Fixed compilation error
  • ARROW-9840 - [Python] fs documentation out of date with code (FileStats -> FileInfo)
  • ARROW-9846 - [Rust] Master branch broken build
  • ARROW-9851 - [C++] Disable AVX512 runtime paths with Valgrind
  • ARROW-9852 - [C++] Add more IPC fuzz regression files
  • ARROW-9852 - [C++] Validate dictionaries fully when combining deltas
  • ARROW-9855 - [R] Fix bad merge/Rcpp conflict
  • ARROW-9859 - [C++] Decode username and password in URIs
  • ARROW-9864 - [Python] Support pathlib.path in pq.write_to_dataset
  • ARROW-9874 - [C++] Add sink-owning version of IPC writers
  • ARROW-9876 - [C++] Faster ARM build on Travis-CI
  • ARROW-9877 - [C++] Fix homebrew-cpp build fail on AVX512
  • ARROW-9879 - [Python] Add support for numpy scalars to ChunkedArray.getitem
  • ARROW-9882 - [C++/Python] Update OSX build to conda-forge-ci-setup=3
  • ARROW-9883 - [R] Fix linuxlibs.R install script for R < 3.6
  • ARROW-9888 - [Rust][DataFusion] Allow ExecutionContext to be shared between threads (again)
  • ARROW-9889 - [Rust][DataFusion] Implement physical plan for EmptyRelation
  • ARROW-9906 - [C++] Keep S3 filesystem alive through open file objects
  • ARROW-9913 - [C++] Make outputs of Decimal128::FromString independent of the presence of one another.
  • ARROW-9920 - [Python] Validate input to pa.concat_arrays() to avoid segfault
  • ARROW-9922 - [Rust] Add StructArray::TryFrom (+40%)
  • ARROW-9924 - [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans
  • ARROW-9931 - [C++] Fix undefined behaviour on invalid IPC input
  • ARROW-9932 - [R] Arrow 1.0.1 R package fails to install on R3.4 over linux
  • ARROW-9936 - [Python] Fix / test relative file paths in pyarrow.parquet
  • ARROW-9937 - [Rust][DataFusion] Improved aggregations
  • ARROW-9943 - [C++] Recursively apply Arrow metadata when reading from Parquet
  • ARROW-9946 - [R] Check sink argument class in ParquetFileWriter
  • ARROW-9953 - [R] Declare minimum version for bit64
  • ARROW-9962 - [Python] Fix conversion to_pandas with tz-aware index column and fixed offset timezones
  • ARROW-9968 - [C++] Fix UBSAN build
  • ARROW-9969 - [C++] Fix RecordBatchBuilder with dictionary types
  • ARROW-9970 - [Go] fix checkptr failure in sum methods
  • ARROW-9972 - [CI] Work around grpc-re2 clash on Homebrew
  • ARROW-9973 - [Java] JDBC DateConsumer does not allow dates before epoch
  • ARROW-9976 - [Python] ArrowCapacityError when doing Table.from_pandas with large dataframe
  • ARROW-9990 - [Rust][DataFusion] Fixed the NOT operator
  • ARROW-9993 - [Python] Tzinfo - string roundtrip fails on pytz.StaticTzInfo objects
  • ARROW-9994 - [C++][Python] Auto chunking nested array containing binary-like fields result malformed output
  • ARROW-9996 - [C++] Dictionary is unset when calling DictionaryArray.GetScalar for null values
  • ARROW-10003 - [C++] Create parent dir for any destination fs in CopyFiles
  • ARROW-10008 - [C++][Dataset] Fix filtering/row group statistics of dict columns
  • ARROW-10011 - [C++] Make FindRE2.cmake re-entrant
  • ARROW-10012 - [C++] Make MockFileSystem thread-safe
  • ARROW-10013 - [FlightRPC][C++] fix setting generic client options
  • ARROW-10017 - [Java] Fix LargeMemoryUtil long conversion
  • ARROW-10022 - [C++] Fix divide by zero and overflow error for scalar arithmetic benchmark
  • ARROW-10027 - [C++] Fix Take array kernel for NullType
  • ARROW-10034 - [Rust] Fix Rust build on master
  • ARROW-10041 - [Rust] Added check of data type to GenericString::from.
  • ARROW-10047 - [CI] Conda integration tests failing with cmake error
  • ARROW-10048 - [Rust] Fixed error in computing min/max with null entries.
  • ARROW-10049 - [C++/Python] Sync conda recipe with conda-forge
  • ARROW-10060 - [Rust][DataFusion] Fixed error on which Err were discarded in MergeExec.
  • ARROW-10062 - [Rust] Fix for null elems at key position in dictionary arrays
  • ARROW-10073 - [Python] Don't rely on dict item order in test_parquet_nested_storage
  • ARROW-10081 - [C++/Python] Fix bash syntax in drone.io conda builds
  • ARROW-10085 - [C++] Fix S3 region resolution on Windows
  • ARROW-10087 - [CI] Fix nightly docs job
  • ARROW-10098 - [R][Doc] Fix copy_files doc mismatch
  • ARROW-10104 - [Python] Separate tests into its own conda package
  • ARROW-10114 - [R] Segfault in to_dataframe_parallel with deeply nested structs
  • ARROW-10116 - [Python][Packaging] Fix gRPC linking error in macOS wheels builds
  • ARROW-10119 - [C++] Fix Parquet crashes on invalid input
  • ARROW-10121 - [C++] Fix emission of new dictionaries in IPC writer
  • ARROW-10124 - [C++] Don't restrict permissions when creating files
  • ARROW-10125 - [R] Int64 downcast check doesn't consider all chunks
  • ARROW-10130 - [C++][Dataset] Ensure ParquetFileFragment::SplitByRowGroup preserves the 'has_complete_metadata' status
  • ARROW-10136 - [Rust] : Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec
  • ARROW-10137 - [C++][R] Move nameof.h into R subproject
  • ARROW-10147 - [Python] Pandas metadata fails if index name not JSON-serializable
  • ARROW-10150 - [C++] Fix crashes on invalid Parquet file
  • ARROW-10169 - [Rust] Pretty print null PrimitiveTypes as empty strings
  • ARROW-10175 - [CI] Fix nightly HDFS integration tests (ensure to use legacy dataset)
  • ARROW-10176 - [C++] Avoid using unformattable types for test parameters
  • ARROW-10178 - [CI] Remove patch to fix Spark master build
  • ARROW-10179 - [Rust] Fixed error in labeler
  • ARROW-10181 - [Rust] Skip compiling one test on 32 bit ARM architecture
  • ARROW-10188 - [Rust][DataFusion] Fixed DataFusion examples.
  • ARROW-10189 - [Doc] Fixed typo in C-Data interface example
  • ARROW-10192 - [Python] Always decode inner dictionaries when converting array to Pandas
  • ARROW-10193 - [Python] Segfault when converting to fixed size binary array
  • ARROW-10200 - [CI][Java] Fix a job failure for s390x Java on TravisCI
  • ARROW-10204 - [Rust] Filter kernel should only count bits in valid range
  • ARROW-10214 - [Python] Allow printing undecodable schema metadata
  • ARROW-10226 - [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file
  • ARROW-10230 - [JS][Doc] JavaScript documentation fails to build
  • ARROW-10232 - FixedSizeListArray is incorrectly written/read to/from parquet
  • ARROW-10234 - [C++][Gandiva] Fix logic of round() for floats/decimals in Gandiva
  • ARROW-10237 - [C++] Duplicate dict values cause corrupt parquet
  • ARROW-10238 - [C#] List<Struct> is broken
  • ARROW-10239 - [C++] Add missing zlib dependency to aws-sdk-cpp
  • ARROW-10244 - [Python] Document pyarrow.dataset.parquet_dataset
  • ARROW-10248 - [Python][Dataset] Always apply Python's default write properties
  • ARROW-10262 - [C++] Fix TypeClass for BinaryScalar and LargeBinaryScalar
  • ARROW-10271 - [Rust] Update dependencies
  • ARROW-10279 - [Release][Python] Fix verification script to align with the new macos wheel platform tags
  • ARROW-10280 - [Packaging][Python] Fix macOS wheel artifact patterns
  • ARROW-10281 - [Python] Fix warnings when running tests
  • ARROW-10284 - [Python] Correctly suppress warning about legacy filesystem on import
  • ARROW-10285 - [Python] Fix usage of deprecated num_children in pyarrow.orc submodule
  • ARROW-10286 - [C++][FlightRPC] Make CMake output less confusing
  • ARROW-10288 - [C++] Fix compilation errors on 32-bit x86
  • ARROW-10290 - [C++] List POP_BACK is not available in older CMake versions
  • ARROW-10296 - [R] Data saved as integer64 loaded as integer
  • ARROW-10517 - [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob
  • ARROW-11062 - [Java] When writing to flight stream, Spark's mapPartitions is not working

New Features and Improvements

  • ARROW-983 - [C++] Implement InputStream and OutputStream classes for interacting with socket connections
  • ARROW-1509 - [Python] Write serialized object as a stream of encapsulated IPC messages
  • ARROW-1644 - [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels
  • ARROW-1669 - [C++] Consider adding Abseil (Google C++11 standard library extensions) to toolchain
  • ARROW-1797 - [C++] Implement binary arithmetic kernels for numeric arrays
  • ARROW-2164 - [C++] Clean up unnecessary decimal module refs
  • ARROW-3080 - [Python] Unify Arrow to Python object conversion paths
  • ARROW-3757 - [R] R bindings for Flight RPC client
  • ARROW-3850 - [Python] Support MapType and StructType for enhanced PySpark integration
  • ARROW-3872 - [R] Add ad hoc test of feather compatibility
  • ARROW-4046 - [Python/CI] Exercise large memory tests
  • ARROW-4248 - [C++][Plasma] Build on Windows / Visual Studio
  • ARROW-4685 - [C++] Update Boost to 1.69 in manylinux1 docker image
  • ARROW-4927 - [Rust] Update top level README to describe current functionality
  • ARROW-4957 - [Rust] [DataFusion] Implement get_supertype correctly
  • ARROW-4965 - [Python] Timestamp array type detection should use tzname of datetime.datetime objects
  • ARROW-5034 - [C#] ArrowStreamWriter and ArrowFileWriter implement sync WriteRecordBatch
  • ARROW-5123 - [Rust] Parquet derive for simple structs
  • ARROW-6075 - [FlightRPC] Handle uncaught exceptions in middleware
  • ARROW-6281 - [Python] Produce chunked arrays for nested types in pyarrow.array
  • ARROW-6282 - [Format] Support lossy compression
  • ARROW-6437 - [R] Add AWS SDK to system dependencies for macOS and Windows
  • ARROW-6535 - [C++] Status::WithMessage should accept variadic parameters
  • ARROW-6537 - [R] : Pass column_types to CSV reader
  • ARROW-6972 - [C#] Support for StructArrays
  • ARROW-6982 - [R] Add bindings for compare and boolean kernels
  • ARROW-7136 - [Rust] Added caching to the docker image
  • ARROW-7218 - [Python] Conversion from boolean numpy scalars not working
  • ARROW-7302 - [C++] CSV: allow dictionary types in explicit column types
  • ARROW-7372 - [C++] Allow creating dictionary array from simple JSON
  • ARROW-7871 - [Python] Expose more compute kernels
  • ARROW-7960 - [C++] Add support fo reading additional types
  • ARROW-8001 - [R][Dataset] Bindings for dataset writing
  • ARROW-8002 - [C++][Dataset][R] Support partitioned dataset writing
  • ARROW-8048 - [Python] Run memory leak tests nightly as follow up to ARROW-4120
  • ARROW-8172 - [C++] ArrayFromJSON for dictionary arrays
  • ARROW-8205 - [Rust][DataFusion] Added check to uniqueness of column names.
  • ARROW-8253 - [Rust] [DataFusion] Improve ergonomics of registering UDFs
  • ARROW-8262 - [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
  • ARROW-8296 - [C++][Dataset] Add IpcFileWriteOptions
  • ARROW-8355 - [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py
  • ARROW-8359 - [C++/Python] Enable linux-aarch64 builds
  • ARROW-8383 - [Rust] Allow easier access to keys array of a dictionary array
  • ARROW-8402 - [Java] Support ValidateFull methods in Java
  • ARROW-8493 - [C++][Parquet] Start populating repeated ancestor defintion
  • ARROW-8494 - [C++][Parquet] Full support for reading mixed list and structs
  • ARROW-8581 - [C#] Accept and return DateTime from DateXXArray
  • ARROW-8601 - [Go][FOLLOWUP] Fix RAT violations related to Flight in Go
  • ARROW-8601 - [Go][Flight] Implementations Flight RPC server and client
  • ARROW-8618 - [C++] Clean up some redundant std::move()s
  • ARROW-8678 - [C++/Python][Parquet] Remove old writer code path
  • ARROW-8712 - [R] Expose strptime timestamp parsing in read_csv conversion options
  • ARROW-8774 - [Rust] [DataFusion] Improve threading model
  • ARROW-8810 - [R] Add documentation about Parquet format, appending to stream format
  • ARROW-8824 - [Rust] [DataFusion] Implement new SQL parser
  • ARROW-8828 - [Rust] Implement SQL tokenizer
  • ARROW-8829 - [Rust] Implement SQL parser
  • ARROW-9010 - [Java] Framework and interface changes for RecordBatch IPC buffer compression
  • ARROW-9065 - [C++] Support parsing date32 in dataset partition folders
  • ARROW-9068 - [C++][Dataset] Simplify partitioning interface
  • ARROW-9078 - [C++] Parquet read / write extension type with nested storage type
  • ARROW-9104 - [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory
  • ARROW-9107 - [C++][Dataset] Support temporal partitioning fields
  • ARROW-9147 - [C++][Dataset] Support projection from null->any type
  • ARROW-9205 - [Documentation] Fix typos
  • ARROW-9266 - [Python][Packaging] Enable S3 support in macOS wheels
  • ARROW-9271 - [R] Preserve data frame metadata in round trip
  • ARROW-9286 - [C++] Add function "aliases" to compute::FunctionRegistry
  • ARROW-9328 - [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string
  • ARROW-9338 - [Rust] Add clippy instructions
  • ARROW-9344 - [C++][Flight] Measure latency quantiles
  • ARROW-9358 - [Integration] remove generated_large_batch.json
  • ARROW-9371 - [Java] Run vector tests for both allocators
  • ARROW-9377 - [Java] Support unsigned dictionary indices
  • ARROW-9387 - [R] Use new C++ table select method
  • ARROW-9388 - [C++] Division kernels
  • ARROW-9394 - [Python] Support pickling of Scalars
  • ARROW-9398 - [C++] Register SIMD sum variants to function instance.
  • ARROW-9402 - [C++] Rework portable wrappers for checked integer arithmetic
  • ARROW-9405 - [R] Switch to cpp11
  • ARROW-9412 - [C++] Add non-bundled dependencies to INTERFACE_LINK_LIBRARIES of static libarrow
  • ARROW-9429 - [Python] ChunkedArray.to_numpy
  • ARROW-9454 - [GLib] Add binding of some dictionary builders
  • ARROW-9465 - [Python] Improve ergonomics of compute module
  • ARROW-9469 - [Python] Make more objects weakrefable
  • ARROW-9487 - [Developer] Cover the archery release utilities with unittests
  • ARROW-9488 - [Release] Use the new changelog generation when updating the website
  • ARROW-9507 - [Rust][DataFusion] Implement Display for PhysicalExpr
  • ARROW-9508 - [Release][APT][Yum] Enable verification for arm64 binaries
  • ARROW-9516 - [Rust][DataFusion] refactor of column names
  • ARROW-9517 - [C++/Python] Add support for temporary credentials to S3Options
  • ARROW-9518 - [Python] Deprecate pyarrow serialization
  • ARROW-9521 - [Rust][DataFusion] Handle custom CSV file extensions
  • ARROW-9523 - [Rust] Improve filter kernel performance
  • ARROW-9534 - [Rust][DataFusion] Added support for lit to all supported rust types.
  • ARROW-9550 - [Rust] [DataFusion] Remove Rc<RefCell<_>> from hash aggregate operator
  • ARROW-9553 - [Rust] Release script doesn't bump parquet crate's arrow dependency version
  • ARROW-9557 - [R] Iterating over parquet columns is slow in R
  • ARROW-9559 - [Rust][DataFusion] Made function public
  • ARROW-9563 - [Dev][Release] Use archery's changelog generator when creating release notes for the website
  • ARROW-9568 - [CI][C++] Use msys2/setup-msys2
  • ARROW-9576 - [Python][Doc] Fix error in example code for extension types
  • ARROW-9580 - [JS][Doc] Fix syntax error in example code
  • ARROW-9581 - [Dev][Release] Bump next snapshot versions to 2.0.0
  • ARROW-9582 - [Rust] Implement memory size methods
  • ARROW-9585 - [Rust][DataFusion] Remove duplicated to-do line
  • ARROW-9587 - [FlightRPC][Java] clean up FlightStream/DoPut
  • ARROW-9593 - [Python] Add custom pickle reducers for DictionaryScalar
  • ARROW-9604 - [C++] Add aggregate min/max benchmark
  • ARROW-9605 - [C++] Speed up aggregate min/max compute kernels on integer types
  • ARROW-9607 - [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers
  • ARROW-9608 - [Rust] Remove arrow flight from parquet's feature gating
  • ARROW-9615 - [Rust] Added kernel to compute length of a string.
  • ARROW-9617 - [Rust][DataFusion] Add length of string array
  • ARROW-9618 - [Rust][DataFusion] Made it easier to write optimizers
  • ARROW-9619 - [Rust][DataFusion] Add predicate push-down
  • ARROW-9632 - [Rust] add a func "new" for ExecutionContextSchemaProvider
  • ARROW-9638 - [C++][Compute] Implement mode kernel
  • ARROW-9639 - [Ruby] Add dependency version check
  • ARROW-9640 - [C++][Gandiva] Implement round() for integers and long integers
  • ARROW-9641 - [C++][Gandiva] Implement round() for floating point and double floating point numbers
  • ARROW-9645 - [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs
  • ARROW-9646 - [C++][Dataset] Support writing with ParquetFileFormat
  • ARROW-9650 - [Packaging][APT] Drop support for Ubuntu 19.10
  • ARROW-9654 - [Rust][DataFusion] Add EXPLAIN <SQL> statement
  • ARROW-9656 - [Rust][DataFusion] Better error messages for unsupported EXTERNAL TABLE types
  • ARROW-9658 - [Python] Python bindings for dataset writing
  • ARROW-9665 - [R] head/tail/take for Datasets
  • ARROW-9667 - [CI][Crossbow] Segfault in 2 nightly R builds
  • ARROW-9671 - [C++] Fix a bug in BasicDecimal128 constructor that interprets uint64_t integers with highest bit set as negative.
  • ARROW-9673 - [Rust][DataFusion] Add a param "dialect" for DFParser::parse_sql
  • ARROW-9678 - [Rust][DataFusion] Improve projection push down to remove unused columns
  • ARROW-9679 - [Rust][DataFusion] More efficient creation of final batch from HashAggregateExec
  • ARROW-9681 - [Java] Fix test failures of Arrow Memory - Core on big-endian platform
  • ARROW-9683 - [Rust][DataFusion] Add debug printing to physical plans and associated types
  • ARROW-9691 - [Rust][DataFusion] Make sql_statement_to_plan method public
  • ARROW-9695 - [Rust] Improve comments on LogicalPlan enum variants
  • ARROW-9699 - [C++][Compute] Optimize mode kernel for small integer types
  • ARROW-9701 - [CI][Java] Add a job for s390x Java on TravisCI
  • ARROW-9702 - [C++] Register bpacking SIMD to runtime path.
  • ARROW-9703 - [Developer][Archery] Restartable cherry-picking process for creating maintenance branches
  • ARROW-9706 - [Java] Tests of TestLargeListVector correctly read offset
  • ARROW-9710 - [C++] Improve performance of Decimal128::ToString by 10x, and make the implementation reusable for Decimal256.
  • ARROW-9711 - [Rust] Add new benchmark derived from TPC-H
  • ARROW-9713 - [Rust][DataFusion] Remove explicit panics
  • ARROW-9715 - [R] changelog/doc updates for 1.0.1
  • ARROW-9718 - [Python] ParquetWriter to work with new FileSystem API
  • ARROW-9721 - [Packaging][Python] Update wheel dependency files
  • ARROW-9722 - [Rust] Shorten key lifetime for dict lookup key
  • ARROW-9723 - [C++][Compute] Count NaN in mode kernel
  • ARROW-9725 - [Rust][DataFusion] SortExec and LimitExec re-use MergeExec
  • ARROW-9737 - [C++][Gandiva] Add bitwise_xor() for integers
  • ARROW-9739 - [CI][Ruby] Don't install gem documents
  • ARROW-9742 - [Rust][DataFusion] Improved DataFrame trait (formerly known as the Table trait)
  • ARROW-9751 - [Rust][DataFusion] Allow UDFs to accept multiple data types per argument
  • ARROW-9752 - [Rust][DataFusion] Add support for User-Defined Aggregate Functions.
  • ARROW-9753 - [Rust][DataFusion] Replaced Arc<Mutex<>> by Box<>
  • ARROW-9754 - [Rust][DataFusion] Implement async in ExecutionPlan trait
  • ARROW-9757 - [Rust][DataFusion] Add prelude.rs
  • ARROW-9758 - [Rust][DataFusion] Allow physical planner to be replaced
  • ARROW-9759 - [Rust][DataFusion] Implement DataFrame.sort()
  • ARROW-9760 - [Rust][DataFusion] Added DataFrame::explain
  • ARROW-9761 - [C/C++] Add experimental C stream inferface
  • ARROW-9762 - [Rust][DataFusion] ExecutionContext::sql now returns DataFrame
  • ARROW-9769 - [Python] Un-skip tests with fsspec in-memory filesystems
  • ARROW-9775 - [C++] Automatic S3 region selection
  • ARROW-9781 - [C++] Fix valgrind uninitialized value warnings
  • ARROW-9782 - [C++][Dataset] More configurable Dataset writing
  • ARROW-9784 - [Rust][DataFusion] Make running TPCH benchmark repeatable
  • ARROW-9786 - [R] Unvendor cpp11 before release
  • ARROW-9788 - [Rust][DataFusion] Rename SelectionExec to FilterExec
  • ARROW-9792 - [Rust][DataFusion] Aggregate expression functions should not return result
  • ARROW-9794 - [C++] Add IsVendor API for CpuInfo
  • ARROW-9795 - [C++][Gandiva] Implement castTIMESTAMP(int64) in Gandiva
  • ARROW-9806 - [R] More compute kernel bindings
  • ARROW-9807 - [R] News update/version bump post-1.0.1
  • ARROW-9808 - [Python] Update read_table doc string
  • ARROW-9811 - [C++] Unchecked floating point division by 0 should succeed
  • ARROW-9813 - [C++] Disable semantic interposition
  • ARROW-9819 - [C++] Bump mimalloc to 1.6.4
  • ARROW-9821 - [Rust][DataFusion] Support for User Defined ExtensionNodes in the LogicalPlan
  • ARROW-9821 - [Rust][DataFusion] Make crate::logical_plan and crate::physical_plan modules
  • ARROW-9823 - [CI][C++][MinGW] Enable S3
  • ARROW-9832 - [Rust] [DataFusion] Refactor PhysicalPlan to remove Partition
  • ARROW-9833 - [Rust][DataFusion] TableProvider.scan now returns ExecutionPlan
  • ARROW-9834 - [Rust] [DataFusion] Remove Partition trait
  • ARROW-9835 - [Rust][DataFusion] Removed FunctionMeta and FunctionType
  • ARROW-9836 - [Rust][DataFusion] Improve API for usage of UDFs
  • ARROW-9837 - [Rust][DataFusion] Added provider for variable
  • ARROW-9838 - [Rust] [DataFusion] DefaultPhysicalPlanner should insert explicit MergeExec nodes
  • ARROW-9839 - [Rust][DataFusion] Implement ExecutionPlan.as_any
  • ARROW-9841 - [Rust] Update checked-in fbs files
  • ARROW-9844 - [CI] Add Go build job on s390x
  • ARROW-9845 - [Rust][Parquet] Move serde_json dependency to dev-dependencies as it is only used in tests
  • ARROW-9848 - [Rust] Implement 0.15 IPC alignment
  • ARROW-9849 - [Rust][DataFusion] Simplified argument types of ScalarFunctions.
  • ARROW-9850 - [Go] Defer should not be used inside a loop
  • ARROW-9853 - [RUST] Implement take kernel for dictionary arrays
  • ARROW-9854 - [R] Support reading/writing data to/from S3
  • ARROW-9858 - [Python][Docs] Add user guide for filesystems interface
  • ARROW-9863 - [C++][Parquet] Compile regexes only once
  • ARROW-9867 - [C++][Dataset] Add FileSystemDataset::filesystem property
  • ARROW-9868 - [C++][R] Provide CopyFiles for copying files between FileSystems
  • ARROW-9869 - [R] Implement full S3FileSystem/S3Options constructor
  • ARROW-9870 - [R] Friendly interface for filesystems (S3)
  • ARROW-9871 - [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
  • ARROW-9873 - [C++][Compute] Optimize mode kernel for integers in small value range
  • ARROW-9875 - [Python] Let FileSystem.get_file_info accept a single path
  • ARROW-9884 - [R] Bindings for writing datasets to Parquet
  • ARROW-9885 - [Rust][DataFusion] Minor code simplification
  • ARROW-9886 - [Rust][DataFusion] Parameterized testing of physical cast.
  • ARROW-9887 - [Rust][DataFusion] Added support for complex return types for built-in functions
  • ARROW-9890 - [R] Add zstandard compression codec in macOS build
  • ARROW-9891 - [Rust][DataFusion] Made math functions accept f32.
  • ARROW-9892 - [Rust][DataFusion] Added concat of utf8
  • ARROW-9893 - [Python] Support parquet options in dataset writing
  • ARROW-9895 - [Rust] Improve sorting kernels
  • ARROW-9899 - [Rust][DataFusion] Switch from Box<Schema> --> SchemaRef (Arc<Schema>) to be consistent with the rest of Arrow
  • ARROW-9900 - [Rust][DataFusion] Switch from Box -> Arc in LogicalPlanNode
  • ARROW-9901 - [C++] Add hand-crafted Parquet to Arrow reconstruction tests
  • ARROW-9902 - [Rust][DataFusion] Add array() built-in function
  • ARROW-9904 - [C++] Unroll the loop of CountSetBits.
  • ARROW-9908 - [Rust] Add support for temporal types in JSON reader
  • ARROW-9910 - [Rust][DataFusion] Fixed error in type coercion of Variadic.
  • ARROW-9914 - [Rust][DataFusion] Document SQL Type --> Arrow type mapping
  • ARROW-9916 - [RUST] Avoid cloning array data
  • ARROW-9917 - [Python][Compute] Bindings for mode kernel
  • ARROW-9919 - [Rust][DataFusion] Speedup math operations by 15%+
  • ARROW-9921 - [Rust] Replace TryFrom by From in StringArray from Vec<Option<&str>> (+50%)
  • ARROW-9925 - [GLib] Add low level value readers for GArrowListArray family
  • ARROW-9926 - [GLib] Use placement new for GArrowRecordBatchFileReader
  • ARROW-9928 - [C++] Speed up integer parsing slightly
  • ARROW-9929 - [Dev] Autotune cmake-format
  • ARROW-9933 - [Developer] Add drone as a CI provider for crossbow
  • ARROW-9934 - [Rust] Shape and stride check in tensor
  • ARROW-9941 - [Python] Better string representation for extension types
  • ARROW-9944 - [Rust][DataFusion] Implement to_timestamp function
  • ARROW-9949 - [C++] Improve performance of Decimal128::FromString by 46%, and make the implementation reusable for Decimal256.
  • ARROW-9950 - [Rust][DataFusion] Made UDFs usable without a registry
  • ARROW-9952 - [Python] Optionally use pyarrow.dataset in parquet.write_to_dataset
  • ARROW-9954 - [Rust][DataFusion] Made aggregates support the same signatures as functions.
  • ARROW-9956 - [C++][Gandiva] Implementation of binary_string function in gandiva
  • ARROW-9957 - [Rust] Replace tempdir with tempfile
  • ARROW-9961 - [Rust][DataFusion] Make to_timestamp function parses timestamp without timezone offset as local
  • ARROW-9964 - [C++] Allow reading date types from CSV data
  • ARROW-9965 - [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
  • ARROW-9966 - [Rust] Speedup kernels for sum,min,max by 10%-60%
  • ARROW-9967 - [Python] Add compute module docs + expose more option classes
  • ARROW-9971 - [Rust] Improve speed of take by 2x-3x (change scaling with batch size)
  • ARROW-9977 - [Rust][Large] StringArray
  • ARROW-9979 - [Rust] Fix arrow crate clippy lints
  • ARROW-9980 - [Rust][Parquet] Fix clippy lints
  • ARROW-9981 - [Rust][Flight] Expose IpcWriteOptions on utils
  • ARROW-9983 - [C++][Dataset][Python] Use larger default batch size than 32K for Datasets API
  • ARROW-9984 - [Rust][DataFusion] Minor cleanup DRY
  • ARROW-9986 - [Rust] allow to_timestamp to parse local times without fractional seconds
  • ARROW-9987 - [Rust][DataFusion] Improved docs for Expr
  • ARROW-9988 - [Rust][DataFusion] Added +-/* as operators to logical expressions.
  • ARROW-9992 - [C++][Python] Refactor python to arrow conversions based on a reusable conversion API
  • ARROW-9998 - [Python] Support pickling DictionaryScalar
  • ARROW-9999 - [Python] Support constructing dictionary array directly through pa.array()
  • ARROW-10000 - [C++][Python] Support constructing StructArray from list of key-value pairs
  • ARROW-10001 - [Rust][DataFusion] Added developer guide to README.
  • ARROW-10010 - [Rust] Speedup arithmetic (1.3-1.9x)
  • ARROW-10015 - [Rust] Simd aggregate kernels
  • ARROW-10016 - [Rust] Implement is null / is not null kernels
  • ARROW-10018 - [CI] Disable Sphinx and API documentation build on master
  • ARROW-10019 - [Rust] Add substring kernel
  • ARROW-10023 - [C++][Gandiva] Implement split_part function in gandiva
  • ARROW-10024 - [C++][Parquet] Create nested reading benchmarks
  • ARROW-10028 - [Rust] Simplified macro
  • ARROW-10030 - [Rust] Add support for FromIter and IntoIter for primitive types
  • ARROW-10035 - [C++] Update vendored libraries
  • ARROW-10037 - [C++] Workaround to force find AWS SDK to look for shared libraries
  • ARROW-10040 - [Rust] Iterate over and combine boolean buffers with arbitrary offsets
  • ARROW-10043 - [Rust][DataFusion] Implement COUNT(DISTINCT col)
  • ARROW-10044 - [Rust] Improved Arrow's README.
  • ARROW-10046 - [Rust][DataFusion] Made RecordBatchReader implement Iterator
  • ARROW-10050 - [C++][Gandiva] Implement concat() in Gandiva for up to 10 arguments
  • ARROW-10051 - [C++][Compute] Move kernel state when merging
  • ARROW-10054 - [Python] don't crash when slice offset > length
  • ARROW-10055 - [Rust] DoubleEndedIterator implementation for NullableIter
  • ARROW-10057 - [C++] Add hand-written Parquet nested tests
  • ARROW-10058 - [C++] Improve repeated levels conversion without BMI2
  • ARROW-10059 - [R][Doc] Give more advice on how to set up C++ build
  • ARROW-10063 - [Archery][CI] Fetch main branch in archery build only when it is a pull request
  • ARROW-10064 - [C++] Resolve compile warnings on Apple Clang 12
  • ARROW-10065 - [Rust] Simplify code (+500, -1k)
  • ARROW-10066 - [C++] Make sure default AWS region selection algorithm is used
  • ARROW-10068 - [C++] Add bundled external project for aws-sdk-cpp
  • ARROW-10069 - [Java] Support running Java benchmarks from command line
  • ARROW-10070 - [C++][Compute] Implement var and std aggregate kernel
  • ARROW-10071 - [R] segfault with ArrowObject from previous session, or saved
  • ARROW-10074 - [C++] Use string constructor instead of string_view.to_string
  • ARROW-10075 - [C++] Use nullopt from arrow::util instead of vendored namespace
  • ARROW-10076 - [C++] Use temporary directory facility in all unit tests
  • ARROW-10077 - [C++] Fix possible integer multiplication overflow
  • ARROW-10083 - [C++] Improve Parquet fuzz seed corpus
  • ARROW-10084 - [Rust][DataFusion] Added length of LargeStringArray and fixed undefined behavior.
  • ARROW-10086 - [Rust] Renamed min/max_large_string kernels
  • ARROW-10090 - [C++][Compute] Improve mode kernel
  • ARROW-10092 - [Dev][Go] Add grpc generated go files to rat exclusion list
  • ARROW-10093 - [R] Add ability to opt-out of int64 -> int demotion
  • ARROW-10096 - [Rust][DataFusion] Removed unused code
  • ARROW-10099 - [C++][Dataset] Simplify type inference for partition columns
  • ARROW-10100 - [C++][Python][Dataset] Add ParquetFileFragment::Subset method
  • ARROW-10102 - [C++] Refactor BasicDecimal128 Multiplication to use unsigned helper
  • ARROW-10103 - [Rust] Add contains kernel
  • ARROW-10105 - [FlightRPC] Add client option to disable certificate validation with TLS
  • ARROW-10120 - [C++] Add two-level nested Parquet read to Arrow benchmarks
  • ARROW-10127 - Update specification for Decimal to allow for 256-bits
  • ARROW-10129 - [Rust] Cargo build is rebuilding dependencies on arrow changes
  • ARROW-10134 - [Python][Dataset] Add ParquetFileFragment.num_row_groups
  • ARROW-10139 - [C++] Add support for building arrow_testing without building tests
  • ARROW-10148 - [Rust] Improved rust/lib.rs that is shown in docs.rs
  • ARROW-10151 - [Python] Add support for MapArray conversion to Pandas
  • ARROW-10155 - [Rust][DataFusion] Improved lib.rs docs
  • ARROW-10156 - [Rust] Added github action to label PRs for rust.
  • ARROW-10157 - [Rust] Add an example to the take kernel
  • ARROW-10160 - [Rust] Improve DictionaryType documentation (clarify which type is which)
  • ARROW-10161 - [Rust][DataFusion] DRYed code in tests
  • ARROW-10162 - [Rust] Add pretty print support for DictionaryArray
  • ARROW-10164 - [Rust] Add support for DictionaryArray to cast kernel
  • ARROW-10167 - [Rust][DataFusion] Support DictionaryArray in sql.rs tests, by using standard pretty printer
  • ARROW-10171 - [Rust][DataFusion] Added ExecutionContext::From<ExecutionContextState>
  • ARROW-10190 - [Website] Add Jorge to list of committers
  • ARROW-10196 - [C++] Add Future::DeferNotOk
  • ARROW-10199 - [Rust][Parquet] Release Parquet at crates.io to remove debug prints
  • ARROW-10201 - [C++][CI] Disable S3 in arm64 job on Travis CI
  • ARROW-10202 - [CI][Windows] Use sf.net mirror for MSYS2
  • ARROW-10205 - [Java][FlightRPC] Allow disabling server validation
  • ARROW-10206 - [C++][Python][FlightRPC] Allow disabling server validation
  • ARROW-10215 - [Rust][DataFusion] Renamed Source to SendableRecordBatchReader.
  • ARROW-10217 - [CI] Run fewer GitHub Actions jobs
  • ARROW-10227 - [Ruby] Use a table size as the default for parquet chunk_size
  • ARROW-10229 - [C++] Remove errant log line
  • ARROW-10231 - [CI] Unable to download minio in arm32v7 docker image
  • ARROW-10233 - [Rust] Make array_value_to_string available in all Arrow builds
  • ARROW-10235 - [Rust][DataFusion] Improve documentation for type coercion
  • ARROW-10240 - [Rust] Optionally load data into memory before running benchmark query
  • ARROW-10251 - [Rust][DataFusion] MemTable::load() now loads partitions in parallel
  • ARROW-10252 - [Python] Add option to skip inclusion of Arrow headers in Python installation
  • ARROW-10256 - [C++][Flight] Disable -Werror carefully
  • ARROW-10257 - [R] Prepare news/docs for 2.0 release
  • ARROW-10260 - [Python] Missing MapType in to_pandas_dtype()
  • ARROW-10265 - [CI] Use smaller build when cache doesn't exist on Travis CI
  • ARROW-10266 - [CI][macOS] Ensure using Python 3.8 with Homebrew
  • ARROW-10267 - [Python] Skip flight test if disable_server_verification feature is not available
  • ARROW-10272 - [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
  • ARROW-10273 - [CI][Homebrew] Fix "brew audit" usage
  • ARROW-10287 - [C++] Avoid std::random_device
  • PARQUET-1845 - [C++] Add expected results of Int96 in big-endian
  • PARQUET-1878 - [C++] lz4 codec is not compatible with Hadoop Lz4Codec
  • PARQUET-1904 - [C++] Export file_offset in RowGroupMetaData
kszucs
published 0.17.0 •

Changelog

Source

Apache Arrow 0.17.0 (2020-04-20)

Bug Fixes

  • ARROW-1907 - [C++/Python] Feather format cannot accommodate string columns containing more than a total of 2GB of data
  • ARROW-2255 - [C++][Developer][Integration] Serialize custom field/schema metadata
  • ARROW-2587 - [Python][Parquet] Verify nested data can be written
  • ARROW-3004 - [Documentation] Builds docs for master rather than a pinned commit
  • ARROW-3543 - [R] Better support for timestamp format and time zones in R
  • ARROW-5265 - [Python][CI] Add integration test with kartothek
  • ARROW-5473 - [C++] Fix googletest_ep build failure on windows+ninja
  • ARROW-5981 - [C++] Propagate errors from MemoTable to DictionaryBuilder
  • ARROW-6528 - [C++] Spurious Flight test failures (port allocation failure)
  • ARROW-6547 - [C++] valgrind errors in diff-test
  • ARROW-6738 - [Java] Fix problems with current union comparison logic
  • ARROW-6757 - [Release] Use same CMake generator for C++ and Python when verifying RC, remove Python 3.5 from wheel verification
  • ARROW-6871 - [Java] Enhance TransferPair related parameters check and tests
  • ARROW-6872 - [Python] Fix empty table creation from schema with dictionary field
  • ARROW-6890 - [Rust] [Parquet] ArrowReader fails with seg fault
  • ARROW-6895 - [C++][Parquet] Do not reset dictionary in ByteArrayDictionaryRecordReader during incremental reads
  • ARROW-7008 - [C++] Check binary offsets and data buffers for nullness in validation. Produce valid arrays in DictionaryEncode on zero-length arrays
  • ARROW-7049 - [C++] Fix MinGW64 warning in FieldRef::Get
  • ARROW-7301 - [Java] Sql type DATE should correspond to DateDayVector
  • ARROW-7335 - [C++][Gandiva] Add day_time_interval functions: castBIGINT, extractDay
  • ARROW-7390 - [C++][Dataset] Fix RecordBatchProjector race
  • ARROW-7405 - [Java] ListVector isEmpty API is incorrect
  • ARROW-7466 - [CI][Java] Fix gandiva-jar-osx nightly build failure
  • ARROW-7467 - [Java] ComplexCopier does incorrect copy for Map nullable info
  • ARROW-7507 - [Rust] Bump Thrift version to 0.13 in parquet-format and parquet
  • ARROW-7520 - [R] Writing many batches causes a crash
  • ARROW-7546 - [Java] Use new implementation to concat vectors values in batch
  • ARROW-7624 - [Rust] Soundness issues via Buffer methods
  • ARROW-7628 - [Python] Clarify docs of csv reader skip_rows and nulls in strings
  • ARROW-7631 - [C++][Gandiva] return zero if there is an overflow while downscaling a decimal
  • ARROW-7672 - [C++] NULL pointer dereference bug
  • ARROW-7680 - [C++] Fix dataset.factory(...) with Windows paths
  • ARROW-7701 - [FlightRPC][C++] disable flaky MacOS test
  • ARROW-7713 - [Java] TastLeak was put at the wrong location
  • ARROW-7722 - [FlightRPC][Java] disable flaky Flight auth test
  • ARROW-7734 - [C++] check status details for nullptr in equality
  • ARROW-7740 - [C++] Fix StructArray::Flatten corruption
  • ARROW-7755 - [Python] Windows wheel cannot be installed on Python 3.8
  • ARROW-7758 - [Python] Safe cast to nanosecond timestamps in to_pandas conversion
  • ARROW-7760 - [Release] Fix verify-release-candidate.sh since pip3 seems to no longer be in miniconda, install miniconda unconditionally
  • ARROW-7762 - [Python] Do not ignore exception for invalid version in ParquetWriter
  • ARROW-7766 - [Python][Packaging] Windows py38 wheels are built with wrong ABI tag
  • ARROW-7772 - [R][C++][Dataset] Unable to filter on date32 object with date64 scalar
  • ARROW-7775 - [Rust] fix: Don't let safe code arbitrarily transmute readers and writers
  • ARROW-7777 - [Go] Fix StructBuilder and ListBuilder panics on index out of range
  • ARROW-7780 - [Release] Fix Windows wheel RC verification script given lack of "m" ABI tag in Python 3.8
  • ARROW-7781 - [C++] Improve message when referencing a missing field
  • ARROW-7783 - [C++] Set ARROW_COMPUTE=ON if ARROW_DATASET=ON
  • ARROW-7785 - [C++] Improve compilation performance of sparse tensor related code
  • ARROW-7786 - [R] Wire up check_metadata in Table.Equals method
  • ARROW-7789 - [R] Can't initialize arrow objects when R.oo package is loaded
  • ARROW-7791 - [C++][Parquet] Fix building error "cannot bind lvalue"
  • ARROW-7792 - [R] read_* functions should close connection to file
  • ARROW-7793 - [Java] Release accounted-for reservation memory to parent in case of leak
  • ARROW-7794 - [Rust][Flight] Remove hard-coded relative path to Flight.proto
  • ARROW-7794 - [Rust] Support releasing arrow-flight
  • ARROW-7797 - [Release][Rust] Fix arrow-flight's version in datafusion crate
  • ARROW-7802 - [C++][Python] Support LargeBinary and LargeString in the hash kernel
  • ARROW-7806 - [Python] Support LargeListArray and list<LargeBinaryArray> conversion to pandas.
  • ARROW-7807 - [R] Installation on RHEL 7 Cannot call io___MemoryMappedFile__Open()
  • ARROW-7809 - [R] vignette does not run on Win 10 nor ubuntu
  • ARROW-7813 - [Rust] Remove and fix unsafe code
  • ARROW-7815 - [C++] Improve input validation
  • ARROW-7827 - [Python] conda-forge pyarrow package does not have s3 enabled
  • ARROW-7832 - [R] Patches to 0.16.0 release
  • ARROW-7836 - [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB
  • ARROW-7837 - [JAVA] copyFromSafe fails due to a bug in handleSafe
  • ARROW-7838 - [C++] Only link Boost libraries with tests, not libarrow.so
  • ARROW-7841 - [C++] Use ${HADOOP_HOME}/lib/native/ to find libhdfs.so again
  • ARROW-7844 - [R] Converter_List is not thread-safe
  • ARROW-7848 - [C++][Python][Doc] Add MapType API doc
  • ARROW-7852 - [Python] 0.16.0 wheels not compatible with older numpy
  • ARROW-7857 - [Python] Revert temporary changes to pandas extension array tests
  • ARROW-7861 - [C++][Parquet] Add fuzz regression corpus for parquet reader
  • ARROW-7884 - [C++] Relax concurrency rules around GetSize()
  • ARROW-7887 - [Rust] Add date/time/duration/timestamp types to filter kernel
  • ARROW-7889 - [Rust] Add support to datafusion-cli for parquet files.
  • ARROW-7899 - [Integration][Java] Fix Flight integration test client to verify each batch
  • ARROW-7908 - [R] Can't install package without setting LIBARROW_DOWNLOAD=true
  • ARROW-7922 - [CI][Crossbow] Nightly macOS wheel builds fail (brew bundle edition)
  • ARROW-7923 - [CI][Crossbow] macOS autobrew fails on homebrew-versions
  • ARROW-7926 - [Dev] Improve "archery lint" UI
  • ARROW-7928 - [Python] Update Python flight server and client examples for latest API
  • ARROW-7931 - [C++] Fix crash on corrupt Map array input (OSS-Fuzz)
  • ARROW-7936 - [Python] Fix and exercise tests on python 3.5
  • ARROW-7940 - [C++] Remove ARROW_USE_CLCACHE handling
  • ARROW-7944 - [Python] Test failures without Pandas
  • ARROW-7956 - [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
  • ARROW-7958 - [Java] Update Avro to version 1.9.2
  • ARROW-7962 - [R][Dataset] Followup to "Consolidate Source and Dataset classes"
  • ARROW-7968 - [C++] orc_ep build fails on 64-bit Raspbian
  • ARROW-7973 - [Developer][C++] ResourceWarnings in run_cpplint.py
  • ARROW-7974 - [C++][Developer] Fix linter warnings when PYTHONDEVMODE enabled
  • ARROW-7975 - [C++] Preserve intended buffer size by default when writing to IPC format
  • ARROW-7978 - [Dev] Do not run IWYU in Github Actions "lint" workflow
  • ARROW-7980 - [Python] Fix creation of tz-aware datetime dtype on first pandas import
  • ARROW-7981 - [C++][Dataset] Fix compilation on gcc 5.4
  • ARROW-7985 - [C++] Fix builder capacity check
  • ARROW-7990 - [Developer][C++] Add option to run "archery lint --iwyu" on all C++ files, not just the ones that you changed. Add "match" option to iwyu.sh
  • ARROW-7992 - [C++] Fix MSVC warning (#6525)
  • ARROW-7996 - [Python] Error serializing empty pandas DataFrame with pyarrow
  • ARROW-7997 - [Python] Schema equals method with inconsistent docs in pyarrow
  • ARROW-7999 - [C++] Fix crash on corrupt List / Map array input
  • ARROW-8000 - [C++] Fix compilation on gcc 4.8
  • ARROW-8003 - [C++] Use CMAKE_C_COMPILER when building bundled bzip2
  • ARROW-8006 - [C++] Initialize spaced data when reading nulls from Parquet
  • ARROW-8007 - [Python] Remove unused and defunct assert_get_object_equal in plasma tests
  • ARROW-8008 - [C++/Python] Set Python3_FIND_FRAMEWORK=LAST
  • ARROW-8009 - [Java] Fix the hash code methods for BitVector
  • ARROW-8011 - [C++] Fix buffer size when reading Parquet data to Arrow
  • ARROW-8013 - [Python][Packaging] Fix building manylinux wheels
  • ARROW-8021 - [Python] Install test requirements including pandas in Appveyor
  • ARROW-8029 - [R] rstudio/r-base:3.6-centos7 GHA build failing on master
  • ARROW-8036 - [C++] Avoid gtest 1.10 deprecation warnings
  • ARROW-8042 - [Python] Clean up docstring and error message when creating ChunkedArray with no chunks
  • ARROW-8057 - [Python] Do not compare schema metadata in Schema.equals and Table.equals by default
  • ARROW-8070 - [C++] Cast segfaults on unsupported cast from list<binary> to utf8
  • ARROW-8071 - [GLib] Fix build error with configure
  • ARROW-8075 - [R] Loading R.utils after arrow breaks some arrow functions
  • ARROW-8088 - [C++][Dataset] Support dictionary partition columns
  • ARROW-8091 - [CI][Crossbow] Fix nightly homebrew and R failures
  • ARROW-8092 - [CI][Crossbow] OSX wheels fail on bundled bzip2
  • ARROW-8094 - [CI][Crossbow] Nightly valgrind test fails
  • ARROW-8095 - [C++] Add support for string dictionary value with length
  • ARROW-8098 - [Go] Avoid unsafe unsafe.Pointer usage
  • ARROW-8099 - [Integration] archery integration --with-LANG flags don't work
  • ARROW-8101 - [FlightRPC][Java] Fix null arrays in Flight with no buffers
  • ARROW-8102 - [Dev] Crossbow's version detection doesn't work in the comment bot's scenario
  • ARROW-8105 - [Python] Fix segfault when shrunken masked array is passed to pyarrow.array
  • ARROW-8106 - [Python] Ensure extension array conversion tests passes with latest pandas
  • ARROW-8110 - [C#] BuildArrays fails if NestedType is included
  • ARROW-8112 - [FlightRPC][C++] make sure status codes round-trip through gRPC
  • ARROW-8119 - [Dev] Make Yaml optional dependency for archery
  • ARROW-8122 - [Python] Empty numpy arrays with shape cannot be deserialized
  • ARROW-8125 - [C++] Restore link between tests created with add_arrow_test and arrow-tests target
  • ARROW-8127 - [C++][Parquet] Incorrect column chunk metadata for multipage batch writes
  • ARROW-8128 - [C#] NestedType children serialized on wrong length
  • ARROW-8132 - [C++] Fix S3FileSystem tests on Windows
  • ARROW-8133 - [CI] Github Actions sometimes fail to checkout Arrow
  • ARROW-8136 - [Python] More robust inference of local relative path in dataset
  • ARROW-8136 - [Python] Restore creating a dataset from a relative path
  • ARROW-8138 - [C++] parquet::arrow::FileReader cannot read multiple RowGroup
  • ARROW-8139 - [C++] FileSystem enum causes attributes warning
  • ARROW-8142 - [C++][Compute] Explicit no chunks case for WrapDatumsLike
  • ARROW-8144 - [CI] Cmake 3.2 nightly build fails
  • ARROW-8154 - [Python] HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release
  • ARROW-8159 - [Python] Support pandas.ExtensionDtype in Schema.from_pandas
  • ARROW-8166 - [C++] fix AVX512 intrinsics fail with clang-8
  • ARROW-8176 - [FlightRPC] bind to a free port for integration tests
  • ARROW-8186 - [Python] Fix dataset expression operation with invalid scalar
  • ARROW-8188 - [R] Adapt to latest checks in R-devel
  • ARROW-8193 - [C++] Fix gcc 4.8 compilation error with non-copyable types in Iterator<T>::ToVector
  • ARROW-8197 - [Rust][DataFusion] Fix schema returned by physical plan
  • ARROW-8206 - [R] Minor fix for backwards compatibility on Linux installation
  • ARROW-8209 - [Python] Improve error message when trying to access duplicate Table column
  • ARROW-8213 - [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message
  • ARROW-8216 - [C++][Compute] Filter out nulls by default
  • ARROW-8217 - [R] Unskip previously failing test on Win32 in test-dataset.R from ARROW-7979
  • ARROW-8219 - [Rust] sqlparser crate needs to be bumped to version 0.2.5
  • ARROW-8223 - [Python] Schema.from_pandas breaks with pandas nullable integer dtype
  • ARROW-8233 - [CI][GLib][R] Fix timeount on MinGW
  • ARROW-8234 - [CI] Build timeouts on "AMD64 Windows RTools 35"
  • ARROW-8236 - [Rust] Linting GitHub Actions task failing
  • ARROW-8237 - [Python][Documentation] Minor corrections to python minimal build documentation
  • ARROW-8237 - [Python][Documentation] Review Python developer documentation, add Dockerfile showing minimal source build with conda and pip/virtualenv
  • ARROW-8238 - [C++] Fix FieldPath type definition
  • ARROW-8239 - [Java] fix param checks in splitAndTransfer method
  • ARROW-8245 - [Python][Parquet] Skip hidden directories when reading partitioned parquet files
  • ARROW-8254 - [Rust] [DataFusion] CLI is not working as expected
  • ARROW-8255 - [Rust][DataFusion] Bug fix for COUNT(*)
  • ARROW-8259 - [Rust][DataFusion] ProjectionPushDown now respects LIMIT
  • ARROW-8268 - [CI][Ruby] Enable Zstandard on Ubuntu 16.04
  • ARROW-8269 - [Python] Add pandas mark to test_parquet_row_group_fragments to fix nopandas build
  • ARROW-8270 - [Python][Flight] Update Python server example to support TLS
  • ARROW-8272 - [CI][Python] Fix test failure on Python 3.5
  • ARROW-8274 - [C++] Use LZ4 frame format for "LZ4" compression in IPC
  • ARROW-8276 - [C++][Dataset] Use Scanner for Fragment.to_table
  • ARROW-8280 - [C++] Use c-ares_INCLUDE_DIR
  • ARROW-8286 - [Python] Ensure to create FileSystemDataset when passing pathlib path
  • ARROW-8298 - [C++][MinGW] Fix gRPC detection
  • ARROW-8303 - [Python] Fix test failure on Python 3.5 caused by non-deterministic dict key ordering
  • ARROW-8304 - [Flight][Python] Fix client example with TLS
  • ARROW-8305 - [Java] ExtensionTypeVector should make sure underlyingVector not null
  • ARROW-8310 - [C++] Improve auto-retry in S3 tests
  • ARROW-8315 - [Python] Fix dataset tests on Python 3.5
  • ARROW-8323 - [C++] Add pragmas wrapping proto_utils.h to disable conversion warnings
  • ARROW-8326 - [C++] Use TYPED_TEST_SUITE instead of deprecated TYPED_TEST_CASE
  • ARROW-8327 - [FlightRPC][Java] check gRPC trailers for null
  • ARROW-8331 - [C++] Fix filter_benchmark.cc compilation
  • ARROW-8333 - [C++] Compile benchmarks in at least one C++ CI entry
  • ARROW-8334 - [C++][Gandiva] Missing DATE32 in LLVM Types
  • ARROW-8342 - [Python] Continue to return dict from "metadata" properties accessing KeyValueMetadata
  • ARROW-8345 - [Python] Ensure feather read/write can work without pandas installed
  • ARROW-8346 - [CI][GLib] Follow pkg-config change in Homebrew
  • ARROW-8349 - [CI][NIGHTLY:gandiva-jar-osx] Use latest pygit2
  • ARROW-8353 - [C++] Fix some compiler warnings in release builds
  • ARROW-8354 - [R] Fix segfault in Table to Array conversion
  • ARROW-8357 - [Rust][DataFusion] Add format dir to dockerfile for CLI
  • ARROW-8358 - [C++] Fix some clang-11 compiler warnings
  • ARROW-8365 - [C++] Error when writing files to S3 larger than 5 GB
  • ARROW-8366 - [Rust][Rust] Support releasing arrow-flight"
  • ARROW-8369 - [CI] Fix crossbow wildcard groups
  • ARROW-8373 - [CI][GLib] Find gio-2.0 manually on macOS
  • ARROW-8380 - Export StringDictionaryBuilder from arrow::array crate
  • ARROW-8384 - [Python][C++] Allow configuring Kerberos ticket cache path
  • ARROW-8386 - [Python] Fix error when pyarrow.jvm gets an empty vector
  • ARROW-8388 - [C++][CI] Ensure Arrow compiles with GCC 4.8
  • ARROW-8397 - [C++] Fail to compile aggregate_test.cc on Ubuntu 16.04
  • ARROW-8406 - [C++][Python] Fix file URI handling
  • ARROW-8410 - [C++] Fix compilation errors on modest ARMv8 platforms (rockpro64, rpi4)
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8415 - [C++][Packaging] Fix gandiva linux job
  • ARROW-8416 - [Python] Add feather alias for ipc format in dataset API
  • ARROW-8420 - [C++] Distinguish ARMv7 from ARMv8 in SetupCxxFlags.cmake
  • ARROW-8427 - [C++][Dataset] Only apply ignore_prefixes to selector results
  • ARROW-8428 - [C++] GCC 4.8 Implicit move-on-return failure in C++ tests
  • ARROW-8429 - [C++] Implement missing checks in IPC MessageDecoder
  • ARROW-8432 - [CI] Don't depend on a single apache mirror for dependencies
  • ARROW-8437 - [C++] Remove std::move return value from MakeRandomNullBitmap test utility
  • ARROW-8438 - [C++] Fix crash in io-memory-benchmark
  • ARROW-8439 - [Python] Update options usage in S3FileSystem docs
  • ARROW-8441 - [C++] Check invalid input in ipc::MessageDecoder
  • ARROW-8442 - [Python] Change NullType.to_pandas_dtype to return object instead of float64
  • ARROW-8460 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8465 - [Packaging][Python] Windows py35 wheel build fails because of boost
  • ARROW-8466 - [Packaging] The python unittests are not running in the windows wheel builds
  • ARROW-8468 - [C++][Documentation] Fix the incorrect null bits description
  • ARROW-8469 - [Dev] Fix nightly docker tests on azure
  • ARROW-8478 - [Java] Revert "ARROW-7534
  • ARROW-8498 - [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works
  • PARQUET-1780 - [C++] Set ColumnMetadata.encoding_stats field
  • PARQUET-1788 - Remove UBSan when rep/dev levels are null
  • PARQUET-1797 - [C++] Fix fuzzer issues
  • PARQUET-1799 - [C++] Stream API: Relax schema checking when reading
  • PARQUET-1810 - [C++] Fix undefined behaviour on invalid enum values (OSS-Fuzz)
  • PARQUET-1813 - [C++] Remove debug print statement from parquet-arrow-schema-test
  • PARQUET-1819 - [C++] Refactor decoding
  • PARQUET-1819 - [C++] Fix crashes on invalid input
  • PARQUET-1823 - [C++] Invalid RowGroup returned by parquet::arrow::FileReader
  • PARQUET-1824 - [C++] Fix crashes and undefined behaviour on invalid input
  • PARQUET-1829 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1831 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1835 - [C++] Fix crashes on invalid input

New Features and Improvements

  • ARROW-590 - [Integration][C++] Implement union types
  • ARROW-1470 - [C++] Add BufferAllocator abstract interface
  • ARROW-1560 - [C++] Kernel implementations for "match" function
  • ARROW-1571 - [C++][Compute] Optimize sorting integers in small value range
  • ARROW-1581 - [Packaging] Tooling to make nightly wheels available for install
  • ARROW-1582 - [Python] Set up + document nightly conda builds for macOS
  • ARROW-1636 - [C++][Integration] Implement integration test parsing in C++ for null type, add integration test data generation
  • ARROW-2447 - [C++] Device and MemoryManager API
  • ARROW-2882 - [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
  • ARROW-3054 - [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel
  • ARROW-3410 - [C++][Python] Add streaming CSV reader.
  • ARROW-3750 - [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate
  • ARROW-4120 - [Python] Testing utility for checking for "macro" memory leaks detectible with psutil.Process
  • ARROW-4226 - [C++] Add sparse CSF tensor support
  • ARROW-4286 - [C++/R] Namespace vendored Boost
  • ARROW-4304 - [Rust] Enhance documentation for arrow
  • ARROW-4428 - [R] Feature flags for R build
  • ARROW-4482 - [Website] Add blog archive page
  • ARROW-4815 - [Rust][DataFusion] Add support for SQL wilcard operator
  • ARROW-5357 - [Rust] Change Buffer::len to represent total bytes instead of used bytes
  • ARROW-5405 - [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript
  • ARROW-5497 - [Release] Build and publish R/Java/JS docs
  • ARROW-5501 - [R] Reorganize read/write file/stream functions
  • ARROW-5510 - [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format
  • ARROW-5563 - [Format] Update integration test JSON format documentation
  • ARROW-5585 - [Go] Rename TypeEquals to TypeEqual
  • ARROW-5742 - [CI][C++] Add nightly Valgrind build
  • ARROW-5757 - [Python] Remove Python 2.7 support
  • ARROW-5949 - [Rust] Implement Dictionary Array
  • ARROW-6165 - [Integration] Run integration tests on multiple cores
  • ARROW-6176 - [Python] Basic implementation of arrow_ext_class, in pure Python
  • ARROW-6275 - [C++] Deprecate RecordBatchReader::ReadNext
  • ARROW-6393 - [C++] Add EqualOptions support in SparseTensor::Equals
  • ARROW-6479 - [C++] Inline errors from externalprojects on failure
  • ARROW-6510 - [Python][Filesystem] Expose nanosecond resolution mtime
  • ARROW-6666 - [Rust] Datafusion parquet string literal support
  • ARROW-6724 - [C++] Allow simpler BufferOutputStream creation
  • ARROW-6821 - [C++][Parquet] Do not require Thrift compiler when building (but still require library)
  • ARROW-6823 - [C++][Python][R] Support metadata in the feather format?
  • ARROW-6829 - [Docs] Migrate integration test docs to Sphinx, fix instructions after ARROW-6466
  • ARROW-6837 - [C++] Add APIs to read and write "custom_metadata" field of IPC file footer
  • ARROW-6841 - [C++] Migrate to LLVM 8
  • ARROW-6875 - [FlightRPC] implement criteria for ListFlights
  • ARROW-6915 - [Developer] Do not overwrite point release fix versions with merge tool
  • ARROW-6947 - [Rust][DataFusion] Scalar UDF support
  • ARROW-6996 - [Python] Expose boolean filter kernel on ChunkedArray/RecordBatch/Table
  • ARROW-7044 - [Release] Create a post release script for the home-brew formulas
  • ARROW-7048 - [Java] Support for combining multiple vectors under VectorSchemaRoot
  • ARROW-7063 - [C++][Python] Add metadata output and toggle in PrettyPrint, add pyarrow.Schema.to_string, disable metadata output by default
  • ARROW-7073 - [Java] Support concating vectors values in batch
  • ARROW-7080 - [C++][Parquet] Read and write "field_id" attribute in Parquet files, propagate to Arrow field metadata. Assorted additional changes
  • ARROW-7091 - [C++] Move DataType factory decls to type_fwd.h
  • ARROW-7119 - [C++][CI] Show automatic backtraces
  • ARROW-7201 - [GLib][Gandiva] Add support for BooleanNode
  • ARROW-7202 - [R][CI] Improve rwinlib building on CI to stop re-downloading dependencies
  • ARROW-7222 - [Python][Release] Wipe any existing generated Python API documentation when updating website
  • ARROW-7233 - [C++] Use Result<T> in remaining value-returning IPC APIs
  • ARROW-7256 - [C++] Remove ARROW_MEMORY_POOL_DEFAULT macro
  • ARROW-7330 - [C++] Migrate Arrow Cuda to Result<T>
  • ARROW-7332 - [C++][Python] Propagate Arrow Status through Parquet errors
  • ARROW-7336 - [C++][Compute] fix minmax kernel options
  • ARROW-7338 - [C++] Improve InMemoryDataSource to support generator instead of static list
  • ARROW-7365 - [Python] Convert FixedSizeList in to_pandas
  • ARROW-7373 - [C++][Dataset] Remove FileSource
  • ARROW-7400 - [Java] Avoid the worst case for quick sort
  • ARROW-7412 - [C++][Dataset] Provide FieldRef to disambiguate field references
  • ARROW-7419 - [Python] Support SparseCSCMatrix
  • ARROW-7427 - [Python] Support SparseCSFTensor
  • ARROW-7428 - [Format][C++] Add serialization for CSF sparse tensors
  • ARROW-7444 - [GLib] Add LocalFileSystem support
  • ARROW-7462 - [C++] Add CpuInfo detection for Arm64 Architecture
  • ARROW-7491 - [Java] Improve the performance of aligning
  • ARROW-7499 - [C++] CMake should collect libs when making static build
  • ARROW-7501 - [C++] CMake build_thrift should build flex and bison if necessary
  • ARROW-7515 - [C++] Rename nonexistent and non_existent to not_found
  • ARROW-7524 - [C++][CI] Enable Parquet in the VS2019 GHA job
  • ARROW-7530 - [Developer] Do not include list of PR commits in commit message when using PR merge tool
  • ARROW-7534 - [Java] Create a new java/contrib module
  • ARROW-7547 - [C++][Dataset][Python] Add ParquetFileFormat options
  • ARROW-7555 - [Python] Drop support for python 2.7
  • ARROW-7587 - [C++][Compute] Implement nth_to_indices kernel
  • ARROW-7608 - [C++][Dataset] Add the ability to list files in FileSystemSource
  • ARROW-7615 - [CI][Gandiva] Ensure gandiva_jni library has only a whitelisted set of shared dependencies
  • ARROW-7616 - [Java] Support comparing value ranges for dense union vector
  • ARROW-7625 - [Parquet][GLib] Add support for writer properties
  • ARROW-7641 - [R] Make dataset vignette have executable code:
  • ARROW-7662 - [R] Support creating ListArray from R list
  • ARROW-7664 - [C++] Rework FileSystemFromUri
  • ARROW-7675 - [R][CI] Move Windows CI from Appveyor to GHA
  • ARROW-7679 - [R] Cleaner interface for creating UnionDataset
  • ARROW-7684 - [Rust] Example Flight client and server for DataFusion
  • ARROW-7685 - [Developer] Add support for GitHub Actions to Crossbow
  • ARROW-7691 - [C++] Check non-scalar Flatbuffers fields are not null
  • ARROW-7708 - [Developer][Release] Include PARQUET issues in release changelogs by scraping git history
  • ARROW-7712 - [CI][Crossbow] Delete fuzzit jobs
  • ARROW-7720 - [C++][Python] Add check_metadata argument to Table.equals
  • ARROW-7725 - [C++] Add infrastructure for unity builds and precompiled headers
  • ARROW-7726 - [CI][C++] Use boost binaries on Windows GHA build
  • ARROW-7729 - [Python][CI] Pin pandas version to 0.25 in the dask integration test
  • ARROW-7733 - [Developer] Download new enough Go locally in release verification script
  • ARROW-7735 - [Release][Python] Use pip to install dependencies for wheel verification
  • ARROW-7736 - [Release] Retry binary download on transient error
  • ARROW-7739 - [GLib] Use placement new to initialize shared_ptr object in private structs
  • ARROW-7741 - [C++] Adds parquet write support for nested types
  • ARROW-7742 - [GLib] Add support for MapArray
  • ARROW-7745 - [Doc][C++] Update Parquet documentation
  • ARROW-7749 - [C++] Link more tests together
  • ARROW-7750 - [Release] Make the source release verification script restartable
  • ARROW-7751 - [Release] macOS wheel verification also needs arrow-testing
  • ARROW-7752 - [Release] Enable and test dataset in the verification script
  • ARROW-7754 - [C++] Make Result<> faster
  • ARROW-7761 - [C++][Python] Support S3 URIs
  • ARROW-7764 - [C++] Don't keep a null bitmap in ArrayData if null_count == 0
  • ARROW-7771 - [Developer] Use ARROW_TMPDIR environment variable in the verification scripts instead of TMPDIR
  • ARROW-7774 - [Packaging][Python] Update macos and windows wheel filenames
  • ARROW-7787 - [Rust] Added .collect to Table API
  • ARROW-7788 - [C++][Parquet] Enable Arrow Schema to Parquet Schema for missing types
  • ARROW-7790 - [Website] Update how to install Linux packages
  • ARROW-7795 - [Rust] Added support for NOT
  • ARROW-7796 - [R] write_* functions should invisibly return their inputs
  • ARROW-7799 - [R][CI] Remove flatbuffers from homebrew formulae
  • ARROW-7804 - [C++][R] Compile error on macOS 10.11
  • ARROW-7812 - [Packaging][Python] Use LLVM 8 in manylinux1 wheels
  • ARROW-7817 - [CI] macOS R autobrew nightly failed on installing dependency from source
  • ARROW-7819 - [C++][Gandiva] Add DumpIR to Filter/Projector object
  • ARROW-7824 - [C++][Dataset] WriteFragments to disk
  • ARROW-7828 - [Release] Remove SSH keys for internal use
  • ARROW-7829 - [R] Test R bindings on clang
  • ARROW-7833 - [R] Make install_arrow() actually install arrow
  • ARROW-7834 - [Release] Post release task for updating the documentations
  • ARROW-7839 - [Python][Dataset] Expose IPC format in python bindings
  • ARROW-7846 - [Python][Dev] Remove dependencies on six
  • ARROW-7847 - [Website] Write a blog post about fuzzing
  • ARROW-7849 - [Packaging][Python] Remove the remaining py27 crossbow wheel tasks from the nightlies
  • ARROW-7858 - [C++][Python] Support casting from ExtensionArray
  • ARROW-7859 - [R] Minor patches for CRAN submission 0.16.0.2
  • ARROW-7860 - [C++] Support cast to/from halffloat
  • ARROW-7862 - [R] Linux installation should run quieter by default
  • ARROW-7863 - [C++][Python][CI] Ensure running HDFS related tests
  • ARROW-7864 - [R] Make sure bundled installation works even if there are system packages
  • ARROW-7865 - [R] Test builds on latest Linux versions
  • ARROW-7868 - [Crossbow] Reduce GitHub API query parallelism
  • ARROW-7869 - [Python] Remove boost::system and boost::filesystem from Python wheels
  • ARROW-7872 - [C++/Python] Support conversion of list of structs to pandas
  • ARROW-7874 - [Python][Archery] Validate docstrings with numpydoc
  • ARROW-7876 - [R] Installation fails in the documentation generation image
  • ARROW-7877 - [Packaging] Fix crossbow deployment to github artifacts
  • ARROW-7879 - [C++][Doc] Add doc for the Device API
  • ARROW-7880 - [CI][R] R sanitizer job is not really working
  • ARROW-7881 - [C++] Fix -Wpedantic warnings
  • ARROW-7882 - [C++][Gandiva] Optimise like function for substring pattern
  • ARROW-7886 - [C++][Dataset][Python][R] Consolidate Source and Dataset classes
  • ARROW-7888 - [Python] Update pyarrow.jvm to support jpype 0.7+
  • ARROW-7890 - [C++] Add Future implementation
  • ARROW-7891 - [C++][GLib][Python][R] Make uniform use of check_metadata=false default. Add Py/R/GLib bindings for RecordBatch::Equals with check_metadata
  • ARROW-7892 - [Python] Add FileSystemDataset.format attribute
  • ARROW-7895 - [Python] Remove more python 2.7 cruft
  • ARROW-7896 - [C++] Refactor from #include guards to #pragma once
  • ARROW-7897 - [Packaging] Temporarily disable artifact uploading until we fix the deployment issues
  • ARROW-7898 - [Python] Reduce the number docstring violations using numpydoc
  • ARROW-7904 - [C++][Python] Revamp metadata display, change show_metadata to verbose_metadata
  • ARROW-7907 - [Python] Add test case for previously failing code involving slicing a 0-length ChunkedArray
  • ARROW-7912 - [Format] C data interface
  • ARROW-7913 - [C++][Python][R] C++ implementation of C data interface
  • ARROW-7915 - [CI][Python] Enable development mode in tests
  • ARROW-7916 - [C++] Project IPC batches to materialized fields only
  • ARROW-7917 - [C++] Find Python 3 in CMake configuration
  • ARROW-7919 - [R] install_arrow() should conda install if appropriate
  • ARROW-7920 - [R] Fill in some missing input validation
  • ARROW-7921 - [Go] Add Reset method to various components and clean up comments.
  • ARROW-7927 - [C++] Fix 'cpu_info.cc' compilation warning.
  • ARROW-7929 - [C++] Align CMake target names to upstreams
  • ARROW-7930 - [CI][Python] Test jpype integration
  • ARROW-7932 - [Rust] implement array_reader for temporal types
  • ARROW-7934 - [C++] Fix UriEscape for empty string
  • ARROW-7935 - [Java] Remove Netty dependency for BufferAllocator and ReferenceManager
  • ARROW-7937 - [Python][Packaging] Remove boost from the macos wheels
  • ARROW-7941 - [Rust][DataFusion] Add support for named columns in logical plan
  • ARROW-7943 - [C++][Parquet] Add code to generate rep/def levels for nested arrays
  • ARROW-7947 - [Rust][Flight][DataFusion] Implement get_schema example
  • ARROW-7949 - [Git] Ignore macOS specific file: 'Brewfile.lock.json'
  • ARROW-7951 - [Python] Expose BYTE_STREAM_SPLIT in pyarrow
  • ARROW-7959 - [Ruby] Add support for Ruby 2.3 again
  • ARROW-7963 - [C++][Dataset][Python] Expose Dataset Fragments to Python
  • ARROW-7965 - [Python] Refine higher level dataset API
  • ARROW-7966 - [FlightRPC][C++] Validate individual batches in integration
  • ARROW-7969 - [Packaging] Use cURL to upload artifacts
  • ARROW-7970 - [Packaging][Python] Use system boost to build the macOS wheels
  • ARROW-7971 - [Rust] Create rowcount utility
  • ARROW-7977 - [C++] Rename fs::FileStats to fs::FileInfo
  • ARROW-7979 - [C++] Add experimental buffer compression to IPC write path. Add "field" selection to read path. Migrate some APIs to Result<T>. Read/write Message metadata
  • ARROW-7982 - [C++] Add function VisitArrayDataInline() helper
  • ARROW-7983 - [CI][R] Nightly builds should be more verbose when they fail
  • ARROW-7984 - [R] Check for valid inputs in more places
  • ARROW-7986 - [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector
  • ARROW-7987 - [CI][R] Fix for verbose nightly builds
  • ARROW-7988 - [R] Fix on.exit calls in reticulate bindings
  • ARROW-7991 - [C++][Plasma] Allow option for evicting if full when creating an object
  • ARROW-7993 - [Java] Support decimal type in ComplexCopier
  • ARROW-7994 - [CI][C++][GLib][Ruby] Move MinGW CI to GitHub Actions from AppVeyor
  • ARROW-7995 - [C++] Add facility to coalesce and cache reads
  • ARROW-7998 - [C++][Plasma] Make Seal requests synchronous
  • ARROW-8005 - [Tools] Update apache mirror links
  • ARROW-8014 - [C++] Provide CMake targets exercising tests with a label
  • ARROW-8016 - [Developer] Fix jira-python deprecation warning in merge_arrow_pr.py
  • ARROW-8018 - [C++][Parquet]Parquet Modular Encryption
  • ARROW-8024 - [R] Bindings for BinaryType and FixedSizeBinaryType
  • ARROW-8026 - [Python] Support memoryview as a value type for creating binary-like arrays
  • ARROW-8027 - [Integration] Add test case for duplicated field names
  • ARROW-8028 - [Go] Allow duplicate field names in schemas and nested types
  • ARROW-8030 - [Plasma] Uniform comments style
  • ARROW-8035 - [Developer][Integration] Add integration tests for extension types
  • ARROW-8039 - [Python] Use dataset API in existing parquet readers and tests
  • ARROW-8044 - [CI][NIGHTLY:gandiva-jar-osx] Pin pygit2 at 1.0.3 for OSX
  • ARROW-8055 - [GLib][Ruby] Add some metadata bindings to GArrowSchema
  • ARROW-8058 - [Dataset] Relax DatasetFactory discovery validation
  • ARROW-8059 - [Python] Make FileSystem objects serializable
  • ARROW-8060 - [Python] Make dataset Expression objects serializable
  • ARROW-8061 - [C++][Dataset] Provide RowGroup fragments for ParquetFileFormat
  • ARROW-8063 - [Python][Dataset] Start user guide for pyarrow.dataset
  • ARROW-8064 - [Dev] Implement Comment bot via Github actions
  • ARROW-8069 - [C++] Should the default value of "check_metadata" arguments of Equals methods be "true"?
  • ARROW-8072 - [Plasma] Add const for plasma protocol
  • ARROW-8077 - [Python][Packaging] Add Windows Python 3.5 wheel build script
  • ARROW-8079 - [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant
  • ARROW-8080 - [C++] Add ARROW_SIMD_LEVEL option
  • ARROW-8082 - [Plasma] Add JNI list() interface
  • ARROW-8083 - [GLib] Add support for Peek() to GIOInputStream
  • ARROW-8086 - [Java] Support writing decimal from big endian byte array in UnionListWriter
  • ARROW-8087 - [C++][Dataset] Partitioning schema fields follow paths' segment ordering
  • ARROW-8096 - [C++][Gandiva] fix TreeExprBuilder::MakeNull to create node for interval type
  • ARROW-8097 - [Dev] Comment bot's crossbow command acts on the master branch
  • ARROW-8103 - [R] Make default Linux build more minimal
  • ARROW-8104 - [C++] Don't install bundled Thrift
  • ARROW-8107 - [Packaging][APT] Use HTTPS for LLVM APT repository for Debian GNU/Linux stretch
  • ARROW-8109 - [Packaging][APT] Drop support for Ubuntu Disco
  • ARROW-8117 - [Datafusion][Rust] allow cast SQLTimestamp to Timestamp
  • ARROW-8118 - [R] dim method for FileSystemDataset
  • ARROW-8120 - [Packaging][APT] Add support for Ubuntu Focal
  • ARROW-8123 - [Rust][DataFusion] Add LogicalPlanBuilder
  • ARROW-8124 - [Rust] Update library dependencies
  • ARROW-8126 - [C++][Compute] Add nth-to-indices kernel benchmark
  • ARROW-8129 - [C++][Compute] Refine compare sort kernel
  • ARROW-8130 - [C++][Gandiva] fix dex visitor to handle interval type
  • ARROW-8140 - [Dev] Follow class name change
  • ARROW-8141 - [C++] speed unpack1_32 using intrinsics API
  • ARROW-8145 - [C++] Rename FileSystem::GetTargetInfos to GetFileInfo
  • ARROW-8146 - [C++] Add per-filesystem facility to sanitize a path
  • ARROW-8150 - [Rust] Allow writing custom FileMetaData k/v pairs
  • ARROW-8151 - [Dataset][Benchmarking] benchmark S3File performance
  • ARROW-8153 - [Packaging] Update the conda feedstock files and upload artifacts to Anaconda
  • ARROW-8158 - [Java] Getting length of data buffer and base variable width vector
  • ARROW-8164 - [C++][Dataset] Provide Dataset::ReplaceSchema()
  • ARROW-8165 - [Packaging] Make nightly wheels available on a PyPI server
  • ARROW-8167 - [CI] Add support for skipping builds with skip pattern in pull request title
  • ARROW-8168 - [Java][Plasma] Improve Java Plasma client off-heap memory usage
  • ARROW-8177 - [rust] Make schema_to_fb_offset public because it is very useful!
  • ARROW-8178 - [C++] Update to Flatbuffers 1.12.0
  • ARROW-8179 - [R] Windows build script tweaking for nightly packaging on GHA
  • ARROW-8181 - [Java][FlightRPC] Expose transport error metadata
  • ARROW-8182 - [Packaging] Increment the version number detected from the latest git tag
  • ARROW-8183 - [C++][Python][FlightRPC] Expose transport error metadata
  • ARROW-8184 - [Packaging] Use arrow-nightlies organization name on Anaconda and Gemfury to host the nightlies
  • ARROW-8185 - [Packaging] Document the available nightly wheels and conda packages
  • ARROW-8187 - [R] Make test assertions robust to i18n
  • ARROW-8191 - [Packaging][APT] Fix cmake removal in Debian GNU/Linux Stretch
  • ARROW-8192 - [C++] script for unpack avx512 intrinsics code
  • ARROW-8194 - [CI] Run tests in parallel on Github Actions
  • ARROW-8195 - [CI][C++][MSVC] Use preinstalled Boost
  • ARROW-8198 - [C++] Format Diff of NullArrays
  • ARROW-8200 - [GLib] Rename garrow_file_system_target_info{,s}() to ..._file_info{,s}()
  • ARROW-8203 - [C#] Use the latest SourceLink
  • ARROW-8204 - [Rust][DataFusion] Add support for aliased expressions in SQL
  • ARROW-8207 - [Packaging][wheel] Use LLVM 8 in manylinux2010 and manylinux2014
  • ARROW-8215 - [CI][GLib] Fix install error on macOS
  • ARROW-8218 - [C++] Decompress record batch messages in parallel at field level. Only allow LZ4_FRAME, ZSTD compression
  • ARROW-8220 - [Python] Make dataset FileFormat objects serializable
  • ARROW-8222 - [C++] Use bcp to make a slim boost for bundled build
  • ARROW-8224 - [C++] Remove APIs deprecated prior to 0.16.0
  • ARROW-8225 - [Rust] Continuation marker check was in wrong location.
  • ARROW-8225 - [Rust] Rust Arrow IPC reader must respect continuation markers.
  • ARROW-8227 - [C++] Refine SIMD feature definitions
  • ARROW-8231 - [Rust] Parse parquet key_value_metadata
  • ARROW-8232 - [Python] Deprecate pyarrow.open_stream and pyarrow.open_file APIs in favor of accessing via pyarrow.ipc namespace
  • ARROW-8235 - [C++][Compute] Filter out nulls by default
  • ARROW-8241 - [Rust] Add Schema convenience methods index_of and field_with_name
  • ARROW-8242 - [C++] Flight fails to compile on GCC 4.8
  • ARROW-8243 - [Rust][DataFusion] Fix inconsistency in LogicalPlanBuilder api
  • ARROW-8244 - [Python] Fix parquet.write_to_dataset to set file path in metadata_collector
  • ARROW-8246 - [C++] Add -Wa,-mbig-obj to CXXFLAGS on MinGW if it is supported
  • ARROW-8247 - [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table
  • ARROW-8249 - [Rust][DataFusion] Table API now uses LogicalPlanBuilder
  • ARROW-8252 - [CI][Ruby] Add Ubuntu 20.04
  • ARROW-8256 - [Rust][DataFusion] Update CLI documentation for 0.17.0 release
  • ARROW-8264 - [Rust][DataFusion] Add utility for printing batches
  • ARROW-8266 - [C++] Provide backup mirrors for thrift externalproject
  • ARROW-8267 - [CI][GLib] Fix build error on Ubuntu 16.04
  • ARROW-8271 - [Packaging] Allow wheel upload failures to gemfury
  • ARROW-8275 - [Python] Update Feather documentation for V2, Python IPC API cleanups / deprecations
  • ARROW-8277 - [Python] implemented eq, repr, and provided a wrapper of Take() for RecordBatch
  • ARROW-8279 - [C++] Do not export Codec implementation symbols, remove codec-specific headers
  • ARROW-8288 - [Python] Expose with_ modifiers on DataType
  • ARROW-8290 - [Python] Improve FileSystemDataset constructor
  • ARROW-8291 - [Packaging] Conda nightly builds can't locate Numpy
  • ARROW-8292 - [Python] Allow to manually specify schema in dataset() function
  • ARROW-8294 - [Flight] Add DoExchange to Flight.proto
  • ARROW-8295 - [C++][Dataset] Push down projection to IpcReadOptions
  • ARROW-8299 - [C++] Reusable "optional ParallelFor" function for optional use of multithreading
  • ARROW-8300 - [R] Documentation and changelog updates for 0.17
  • ARROW-8307 - [Python] Add memory_map= option to pyarrow.feather.read_table
  • ARROW-8308 - [Rust] Implement DoExchange on examples
  • ARROW-8309 - [CI] C++/Java/Rust workflows should trigger on changes to Flight.proto
  • ARROW-8311 - [C++] Add push style stream format reader
  • ARROW-8316 - [CI] Set docker-compose to use docker-cli instead of docker-py for building images
  • ARROW-8319 - [CI] Install thrift compiler in the debian build
  • ARROW-8320 - [Format] Add clarification to CDataInterface.rst regarding memory alignment of buffers
  • ARROW-8321 - [CI] Use bundled thrift in Fedora 30 build
  • ARROW-8322 - [CI] Fix C# workflow file syntax
  • ARROW-8325 - [R][CI] Stop including boost in R windows bundle
  • ARROW-8329 - [Documentation][C++] Undocumented FilterOptions argument in Filter kernel
  • ARROW-8330 - [Documentation] The post release script generates the documentation with a development version
  • ARROW-8332 - [C++] Don't require Thrift compiler for Parquet build
  • ARROW-8335 - [Release] Add crossbow jobs to run release verification
  • ARROW-8336 - [Packaging][deb] Use libthrift-dev on Debian 10 and Ubuntu 19.10 or later
  • ARROW-8341 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8343 - [GLib] Add GArrowRecordBatchIterator
  • ARROW-8347 - [C++] Migrate Array methods to Result<T>
  • ARROW-8351 - [R][CI] Store the Rtools-built Arrow C++ library as a build artifact
  • ARROW-8352 - [R] Add install_pyarrow()
  • ARROW-8356 - [Developer] Support * wildcards with "crossbow submit" via GitHub actions
  • ARROW-8361 - [C++] Add Result<T> APIs to Buffer methods and functions
  • ARROW-8362 - [Crossbow] Ensure that the locally generated version is used in the docker tasks
  • ARROW-8367 - [C++] Deprecate Buffer::FromString(..., MemoryPool*)
  • ARROW-8368 - [C++][C Data Interface] Move several child arrays
  • ARROW-8370 - [C++] Migrate type/schema APIs to Result<T>
  • ARROW-8371 - [Crossbow] Implement and exercise sanity checks for tasks.yml
  • ARROW-8372 - [C++] Migrate Table and RecordBatch APIs to Result<T>
  • ARROW-8375 - [CI][R] Make Windows tests more verbose in case of segfault
  • ARROW-8376 - [R] Add experimental interface to ScanTask/RecordBatch iterators
  • ARROW-8387 - [Rust] Make schema_to_fb public
  • ARROW-8389 - [Integration] Run tests in parallel
  • ARROW-8390 - [R] Expose schema unification features
  • ARROW-8393 - [C++][Gandiva] Make gandiva function registry case-insensitive
  • ARROW-8396 - [Rust] Removes libc dependency
  • ARROW-8398 - [Python] Remove deprecated API usage from python tests
  • ARROW-8401 - [C++] Add byte-stream-split AVX2/AVX512 implementation
  • ARROW-8403 - [C++] Add ToString() to ChunkedArray, Table and RecordBatch
  • ARROW-8407 - [Rust] Add documentation for Dictionary data type
  • ARROW-8408 - [Python] Add memory_map argument to feather.read_feather
  • ARROW-8409 - [R] Add R wrappers for getting and setting global CPU thread pool capacity
  • ARROW-8412 - [C++][Gandiva] Fix gandiva date_diff function definitions
  • ARROW-8433 - [R] Add feather alias for ipc format in dataset API
  • ARROW-8444 - [Documentation] Fix spelling errors across the codebase
  • ARROW-8449 - [R] Use CMAKE_UNITY_BUILD everywhere
  • ARROW-8450 - [Integration][C++] Implement large offsets types
  • ARROW-8457 - [C++] Add expected results for ArrowSchema in big-endian
  • ARROW-8458 - [C++] Prefer the original mirrors for the bundled thirdparty dependencies
  • ARROW-8461 - [Packaging][deb] Use zstd package for Ubuntu Xenial
  • ARROW-8463 - [CI] Balance the nightly test builds between CircleCI, Azure and Github
  • ARROW-8679 - [Python] supporting pandas sparse series in pyarrow
  • PARQUET-458 - [C++][Parquet] Add support for reading/writing DataPageV2 format
  • PARQUET-1663 - [C++] Provide API to check the presence of repeated fields
  • PARQUET-1716 - [C++] Add BYTE_STREAM_SPLIT encoder and decoder
  • PARQUET-1770 - [C++][CI] Add fuzz target for reading Parquet files
  • PARQUET-1785 - [C++] Implement ByteStreamSplitDecoder::DecodeArrow and refactor tests
  • PARQUET-1786 - [C++] Improve ByteStreamSplit decoder using SSE2
  • PARQUET-1806 - [C++] Improve fuzzing seed corpus
  • PARQUET-1825 - [C++] Fix compilation error in column_io_benchmark.cc
  • PARQUET-1828 - [C++] Use SSE2 for the ByteStreamSplit encoder
  • PARQUET-1840 - [C++] Stop Early on DecodeSpaced
kszucs
published 0.16.0 •

Changelog

Source

Apache Arrow 0.16.0 (2020-02-07)

Bug Fixes

  • ARROW-3783 - [R] Incorrect collection of float type
  • ARROW-3962 - [Go] Handle null values in CSV
  • ARROW-4470 - [Python] Pyarrow using considerable more memory when reading partitioned Parquet file
  • ARROW-4998 - [R] R package fails to install on OSX
  • ARROW-5575 - [C++] Split Targets.cmake for each module
  • ARROW-5655 - [Python] Table.from_pydict/from_arrays not using types in specified schema correctly
  • ARROW-5680 - [Rust][DataFusion] GROUP BY sql tests are now deterministic
  • ARROW-6157 - [C++] Array data validation
  • ARROW-6195 - [C++] Detect Apache mirror without Python
  • ARROW-6298 - [Rust] [CI] Examples are not being tested in CI
  • ARROW-6320 - [C++] Arrow utilities are linked statically
  • ARROW-6429 - [CI][Crossbow] Nightly spark integration job fails
  • ARROW-6445 - [CI][Crossbow] Nightly Gandiva jar trusty job fails
  • ARROW-6567 - [Rust][DataFusion] Wrap aggregate in projection when needed
  • ARROW-6581 - [C++] Fix fuzzit job submission
  • ARROW-6704 - [C++] Check for out of bounds timestamp in unsafe cast
  • ARROW-6708 - [C++] Fix hardcoded boost library names
  • ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
  • ARROW-6736 - [Rust][DataFusion] Evaluate the input to the aggregate expression just once per batch
  • ARROW-6740 - [C++] Unmap MemoryMappedFile as soon as possible
  • ARROW-6745 - [Rust] Fix a variety of minor typos.
  • ARROW-6749 - [Python] Let Array.to_numpy use general conversion code with zero_copy_only=True
  • ARROW-6750 - [Python] Silence S3 error logs by default
  • ARROW-6761 - [Rust] Travis build now uses the correct Rust toolchain
  • ARROW-6762 - [C++] Support reading JSON files with no newline at end
  • ARROW-6785 - [JS] Remove superfluous child assignment
  • ARROW-6786 - [C++] arrow-dataset-file-parquet-test is slow
  • ARROW-6795 - [C#] Fix for reading large (2GB+) files
  • ARROW-6798 - [CI] [Rust] Improve build times by caching dependencies in the Docker image
  • ARROW-6801 - [Rust] Arrow source release tarball is missing benchmarks
  • ARROW-6806 - [C++][Python] Fix crash validating an IPC-originating empty array
  • ARROW-6808 - [Ruby] Ensure requiring suitable MSYS2 package
  • ARROW-6809 - [RUBY] Gem does not install on macOS due to glib2 3.3.7 compilation failure
  • ARROW-6812 - [Java] Fix License header
  • ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
  • ARROW-6820 - [Format] Update Map type child to "entries"
  • ARROW-6834 - [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds
  • ARROW-6835 - [Archery][CMake] Restore ARROW_LINT_ONLY cmake option
  • ARROW-6842 - [Website] Jekyll error building website
  • ARROW-6844 - [C++][Parquet] Fix regression in reading List types with item name that is not "item"
  • ARROW-6846 - [C++] Build failures with glog enabled
  • ARROW-6857 - [C++] Fix DictionaryEncode for zero-chunk ChunkedArray
  • ARROW-6859 - [CI][Nightly] Disable docker layer caching for CircleCI tasks
  • ARROW-6860 - [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so
  • ARROW-6861 - [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder
  • ARROW-6864 - [C++] Add compression-related compile definitions before adding any unit tests
  • ARROW-6867 - [FlightRPC][Java] clean up default executor
  • ARROW-6868 - [Go] Fix slicing struct arrays
  • ARROW-6869 - [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add "FinishDelta" method and "ResetFull" method
  • ARROW-6873 - [Python] Remove stale CColumn references
  • ARROW-6874 - [Python] Fix memory leak when converting to Pandas object data
  • ARROW-6876 - [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns
  • ARROW-6877 - [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions
  • ARROW-6878 - [Python] Fix creating array from list of dicts with bytes keys
  • ARROW-6882 - [C++] Ensure the DictionaryArray indices has no dictionary data
  • ARROW-6885 - [Python] Remove superfluous skipped timedelta test
  • ARROW-6886 - [C++] Fix arrow::io nvcc compiler warnings
  • ARROW-6898 - [Java][hotfix] fix ArrowWriter memory leak
  • ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
  • ARROW-6899 - [Python] Decode dictionary-encoded List children to dense when converting to pandas
  • ARROW-6901 - [Rust][Parquet] Increment total_num_rows when closing a row group
  • ARROW-6903 - [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc
  • ARROW-6905 - [Gandiva][Crossbow] Use xcode9.4 for osx builds, do not build dataset, filesystem
  • ARROW-6910 - [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this
  • ARROW-6913 - [R] Potential bug in compute.cc
  • ARROW-6914 - [CI] docker-clang-format nightly failing
  • ARROW-6922 - [Python] Compat with pandas for MultiIndex.levels.names
  • ARROW-6925 - [C++] Only add -stdlib flag on MacOS when using clang.
  • ARROW-6929 - [C++] Remove first offset==0 check from Validate()
  • ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
  • ARROW-6938 - [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues
  • ARROW-6948 - [Rust][Parquet] Fix boolean array in arrow reader.
  • ARROW-6950 - [C++][Dataset] Add dataset benchmark example
  • ARROW-6957 - [CI][Crossbow] Nightly R with sanitizers build fails installing dependencies
  • ARROW-6962 - [C++][CI] Stop compiling with -Weverything
  • ARROW-6966 - [Go] Set a default memset for when the platform doesn't set one
  • ARROW-6977 - [C++] Disable jemalloc background_thread on macOS
  • ARROW-6983 - [C++] Fix ThreadedTaskGroup lifetime issue
  • ARROW-6989 - [Python] Check for out of range precision decimals in python conversion
  • ARROW-6992 - [C++] : Undefined Behavior sanitizer build option fails with GCC
  • ARROW-6999 - [Python] Fix unnamed index when specifying schema in Table.from_pandas
  • ARROW-7013 - [C++] arrow-dataset pkgconfig is incomplete
  • ARROW-7020 - [Java] Fix the bugs when calculating vector hash code
  • ARROW-7021 - [Java] UnionFixedSizeListWriter decimal type should check writer index
  • ARROW-7022 - , ARROW-7023: [Python] fix handling of pandas Index and Period/Interval extension arrays in pa.array
  • ARROW-7023 - [Python] pa.array does not use "from_pandas" semantics for pd.Index
  • ARROW-7024 - [CI][R] Update R dependencies for Conda build
  • ARROW-7027 - [Python] Correctly raise error in pa.table(..) on invalid input
  • ARROW-7033 - [C++] Set SDKROOT automatically on macOS
  • ARROW-7045 - [R] Preserve factor in Parquet roundtrip
  • ARROW-7050 - [R] Fix compiler warnings in R bindings
  • ARROW-7053 - [Python] setuptools-scm produces incorrect version at apache-arrow-0.15.1 tag
  • ARROW-7056 - [Python] Fix test_fs failures when S3 not enabled
  • ARROW-7059 - [C++][Parquet] Mostly fix performance regression when reading Parquet file with many columns
  • ARROW-7074 - [C++] ASSERT_OK_AND_ASSIGN should use ASSERT_OK instead of EXPE…
  • ARROW-7077 - [C++] Casting dictionary to unrelated value type shouldn't crash
  • ARROW-7087 - [Python] Metadata disappear from pandas dataset
  • ARROW-7097 - [Rust][CI] Apply rustfmt nightly
  • ARROW-7100 - [C++][HDFS] Fix search directories for libjvm.so
  • ARROW-7105 - [CI][Crossbow] Nightly homebrew-cpp job fails
  • ARROW-7106 - [Java] Fix the problem that flight perf test hangs endlessly
  • ARROW-7117 - [C++][CI] Fix the hanging C++ tests in Windows 2019
  • ARROW-7128 - [CI] Use proper version for fedora tests in GitHub actions cron jobs
  • ARROW-7133 - [CI] Allow GH Actions to run on all branches
  • ARROW-7142 - [C++] GCC compilation failures in nightlies
  • ARROW-7152 - [Java] Delete useless class DiffFunction
  • ARROW-7157 - [R] Add validation, helpful error message to Object$new()
  • ARROW-7158 - [C++] Use compiler information provided by CMake
  • ARROW-7163 - [Doc] Fix double-and typos
  • ARROW-7164 - [CI] Dev cron github action is failing every 15 minutes
  • ARROW-7167 - [CI][Python] Add nightly tests for additional pandas versions to Github Actions
  • ARROW-7168 - [Python] Respect the specified dictionary type for pd.Categorical conversion
  • ARROW-7170 - [C++] Fix linking with bundled ORC
  • ARROW-7180 - [CI] Java builds are not triggered on the master branch
  • ARROW-7181 - [C++] Fix an Arrow module search bug with pkg-config
  • ARROW-7183 - [CI][Crossbow] Re-skip r-sanitizer nightly tests
  • ARROW-7187 - [C++][Doc] doxygen broken on master because of @
  • ARROW-7188 - [C++][Doc] doxygen broken on master: missing param implicit_casts
  • ARROW-7189 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-7194 - [Rust] Fix CSV writer recursion issues
  • ARROW-7199 - [Java] Fix ConcurrentModificationException in BaseAllocator::getChildAllocators
  • ARROW-7200 - [C++][Flight] Enable the server to serve to remote clients
  • ARROW-7209 - [Python] Fix tests on pandas master related to extension dtype conversion
  • ARROW-7212 - [Go] add missing Release to benchmark code
  • ARROW-7214 - [Python] Fix pickling of DictionaryArray
  • ARROW-7217 - [CI][Python] Use correct python version in Github Actions
  • ARROW-7225 - [C++] Fix *std::move(Result<T>) for move-only T
  • ARROW-7249 - [CI] Release test fails in master due to new arrow-flight Rust crate
  • ARROW-7250 - [C++] Define constexpr symbols explicitly in StringToFloatConverter::Impl
  • ARROW-7253 - [CI] Fix failure in release test
  • ARROW-7254 - [Java] BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
  • ARROW-7264 - [Java] RangeEqualsVisitor type check is not correct
  • ARROW-7266 - [C++] Fix ArrayDataVisitor on sliced binary-like array
  • ARROW-7271 - [C++][Flight] Use the single parameter version of SetTotalBytesLimit
  • ARROW-7281 - [C++] Make Adaptive builders' length match expectations
  • ARROW-7282 - [Python] IO functions should raise the right exceptions
  • ARROW-7291 - [Dev] Fix FORMAT_DIR
  • ARROW-7294 - [Python] converted_type_name_from_enum(): Incorrect name for INT_64
  • ARROW-7295 - [R] Fix bad test that causes failure on R < 3.5
  • ARROW-7298 - [C++] Fix thirdparty dependency downloader script
  • ARROW-7314 - [Python] Fix compiler warning in pyarrow.union
  • ARROW-7318 - [C#] TimestampArray serialization failure
  • ARROW-7320 - [C++] Specify CMAKE_INSTALL_LIBDIR for gbenchmark
  • ARROW-7327 - [CI] Failing C GLib and R buildbot builders
  • ARROW-7328 - [CI] GitHub Actions should trigger on changes to GitHub Actions configuration
  • ARROW-7341 - [CI] Unbreak nightly Conda R job
  • ARROW-7343 - [Java][FlightRPC] prevent leak in DoGet
  • ARROW-7349 - [C++] Fix the bug of parsing string hex values
  • ARROW-7353 - [C++] Ignore -Wmissing-braces when building with clang
  • ARROW-7354 - [C++] Fix crash in test-io-hdfs
  • ARROW-7355 - [CI] Environment variables are defined twice for the fuzzit builds
  • ARROW-7358 - [CI] [Dev] [C++] ccache disabled on conda-python-hdfs
  • ARROW-7359 - [C++][Gandiva] Don't throw error for locate function for start position exceeding string length
  • ARROW-7360 - [R] Can't use dplyr filter() with variables defined in parent scope
  • ARROW-7361 - [Rust] Build directory is not passed to ci/scripts/rust_test.sh
  • ARROW-7362 - [Python][C++] Added ListArray.Flatten() that properly flattens a ListArray
  • ARROW-7374 - [Dev][C++] Fix cuda-cpp docker build
  • ARROW-7381 - [C++] Unbreak manylinux1 wheels after Iterator refactor
  • ARROW-7386 - [C#] Array offset does not work properly
  • ARROW-7388 - [Python] Skip HDFS tests if libhdfs cannot be located
  • ARROW-7389 - [Python][Packaging] Remove pyarrow.s3fs import check from the recipe
  • ARROW-7393 - [Plasma] Fix plasma executable name in plasma_java build
  • ARROW-7395 - [C++] Do not warn or error on logical "or" with constants
  • ARROW-7397 - [C++][JSON] Fix white space length detection error
  • ARROW-7404 - [C++][Gandiva] Fix utf8 char length error on Arm64
  • ARROW-7406 - [Java] NonNullableStructVector#hashCode should pass hasher to child vectors
  • ARROW-7407 - [Python] Declare NumPy a PEP517 build dependency
  • ARROW-7408 - [C++] Fix compilation of reference benchmarks
  • ARROW-7435 - [C++] Validate all list / binary offsets in ValidateFull()
  • ARROW-7436 - [Archery] Enable more benchmark binaries in archery benchmark
  • ARROW-7437 - [Java] ReadChannel#readFully does not set writer index correctly
  • ARROW-7442 - [Ruby] Add abstract type check to Arrow::DataType.resolve
  • ARROW-7447 - [Java] ComplexCopier does incorrect copy in some cases
  • ARROW-7450 - [C++] Also link boost_filesystem when using static test linkage
  • ARROW-7458 - [GLib] Fix incorrect build dependency in Makefile
  • ARROW-7471 - [CI][Python] Run flake8 on Cython files
  • ARROW-7472 - [Java] Fix some incorrect behavior in UnionListWriter
  • ARROW-7478 - [Rust][DataFusion] Group by expression ignored unless paired with aggregate expression
  • ARROW-7492 - [CI][Crossbow] Nightly homebrew-cpp job fails on Python installation
  • ARROW-7497 - [Python] Stop relying on (deprecated) pandas.util.testing, move to pandas.testing
  • ARROW-7500 - [C++][Dataset] Remove std::regex usage
  • ARROW-7503 - [Rust][Parquet] Fix build failures
  • ARROW-7506 - [Java] JMH benchmarks should be called from main methods
  • ARROW-7508 - [C#] DateTime32 Reading is Broken
  • ARROW-7510 - [C++] Make ArrayData::null_count thread-safe
  • ARROW-7516 - [C#] Fix .NET Benchmarks
  • ARROW-7518 - [Python] Use PYARROW_WITH_HDFS when building wheels, conda packages
  • ARROW-7527 - [Python] Fix pandas/feather tests for unsupported types with pandas master
  • ARROW-7528 - [Python] Remove usage of deprecated pd.np and pd.datetime in tests
  • ARROW-7535 - [C++] Fix ASAN failures in Array::Validate()
  • ARROW-7543 - [R] Fixes R arrow::write_parquet() documentation code examples
  • ARROW-7545 - [C++] [Dataset] Scanning dataset with dictionary type hangs
  • ARROW-7551 - [FlightRPC][C++] Flight test on macOS fails due to Homebrew gRPC
  • ARROW-7552 - [C++][CI] Disable timing-sensitive tests on public CI
  • ARROW-7554 - [C++] Add support for building on FreeBSD
  • ARROW-7559 - [Rust] Incorrect index check assertion in StringArray and BinaryArray
  • ARROW-7561 - [Doc][Python] Add missing conda_env_gandiva.yml in python.rst
  • ARROW-7563 - [Rust] failed to select a version for `byteorder`
  • ARROW-7582 - [Rust][Flight] Unable to compile arrow.flight.protocol.rs
  • ARROW-7583 - [FlightRPC][C++] relax auth tests due to nondeterminism
  • ARROW-7591 - [Python] Fix DictionaryArray.to_numpy() to return decoded numpy array
  • ARROW-7592 - [C++] Fix crashes on corrupt IPC input
  • ARROW-7593 - [CI][Python] Python datasets failing / not run on CI
  • ARROW-7595 - [R][CI] R appveyor job fails due to pacman compression change
  • ARROW-7596 - [Python] Only permit zero-copy DataFrame block construction when split_blocks=True
  • ARROW-7599 - [Java] Fix build break due to change in RangeEqualsVisitor
  • ARROW-7603 - [Packaging][RPM] Add workaround for LLVM on CentOS 8
  • ARROW-7611 - [Packaging][Python] Fix artifacts patterns for wheel
  • ARROW-7612 - [Packaging][Python] Fix artifacts path for Conda on Windows
  • ARROW-7614 - [Python] Limit size of data in test_parquet.py::test_set_data_page_size
  • ARROW-7618 - [C++] Fix crashes or undefined behaviour on corrupt IPC input
  • ARROW-7620 - [Rust] Remove call to flatc
  • ARROW-7621 - [Doc] Fix doc build
  • ARROW-7634 - [Python] Run pyarrow.dataset tests on Appveyor + fix failures to parse Windows file paths
  • ARROW-7638 - [C++][Dataset] Fix a segfault in DirectoryPartitioningFactory
  • ARROW-7639 - [R] Cannot convert Dictionary Array to R when values aren't strings
  • ARROW-7640 - [C++][Dataset][Parquet] Detect missing compression support
  • ARROW-7647 - [C++] Repair JSON parser's handling of ListArrays
  • ARROW-7650 - [C++][Dataset] enable dataset tests on Windows
  • ARROW-7651 - [CI][Crossbow] Nightly macOS wheel builds fail
  • ARROW-7652 - [Python][Dataset] Use implicit cast in ScannerBuilder.filter
  • ARROW-7661 - [Python] Test for optimal CSV chunking
  • ARROW-7689 - [FlightRPC][C++] bump bundled gRPC to 1.25 to fix MacOS test failure
  • ARROW-7690 - [R] Cannot write parquet to OutputStream
  • ARROW-7693 - [CI] Fix test name for Spark integration, add new tests
  • ARROW-7709 - [Python] Preserve column name in conversion from Table column to pandas for non-ns timestamps
  • ARROW-7714 - [Release] Add missing variable expansion
  • ARROW-7718 - [Release] Fix auto-retry in the binary release script
  • ARROW-7723 - [Python] Triage untested functional regression when converting tz-aware timestamp inside struct to pandas/NumPy format
  • ARROW-7727 - [Python] Unable to read a ParquetDataset when schema validation is on.
  • ARROW-8135 - [Python] Problem importing PyArrow on a cluster
  • ARROW-8638 - Arrow Cython API Usage Gives an error when calling CTable API Endpoints
  • PARQUET-1692 - [C++] Don't use the same CMake variable name for thirdparty version and found version
  • PARQUET-1692 - [C++] LogicalType::FromThrift error on Centos 7 RPM
  • PARQUET-1693 - [C++] Fix parquet examples with compression define guards
  • PARQUET-1702 - [C++] Make BufferedRowGroupWriter compatible with parquet encryption
  • PARQUET-1706 - [C++] Wrong dictionary_page_offset when writing only data pages via BufferedPageWriter
  • PARQUET-1707 - [C++] : parquet-arrow-test fails with UBSAN
  • PARQUET-1709 - [C++] Avoid unnecessary temporary std::shared_ptr copies
  • PARQUET-1715 - [C++] Add the Parquet code samples to CI + Refactor Parquet Encryption Samples
  • PARQUET-1720 - [C++] JSONPrint not showing version correctly
  • PARQUET-1747 - [C++] Access to ColumnChunkMetaData fails when encryption is on
  • PARQUET-1766 - [C++] Handle parquet::Statistics NaNs and -0.0f as per upstream parquet-mr
  • PARQUET-1772 - [C++] ParquetFileWriter: Data overwritten in append mode

New Features and Improvements

  • ARROW-412 - [Format][Documentation] Clarify that Buffer.size in Flatbuffers should reflect the actual memory size rather than the padded size
  • ARROW-501 - [C++] Implement concurrent / buffering InputStream for streaming data use cases
  • ARROW-772 - [C++] Implement take kernel functions
  • ARROW-843 - [C++][Dataset] Ensure Schemas are unified in DataSourceDiscovery
  • ARROW-976 - [C++][Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes
  • ARROW-1036 - [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)
  • ARROW-1119 - [Python/C++] Implement NativeFile interfaces for Amazon S3
  • ARROW-1175 - [Java] Implement/test dictionary-encoded subfields
  • ARROW-1456 - [Python] Run s3fs unit tests in Travis CI
  • ARROW-1562 - [C++] Numeric kernel implementations for add
  • ARROW-1638 - [Java] IPC roundtrip for null type
  • ARROW-1900 - [C++] Add kernel for min / max
  • ARROW-2428 - [Python] Support pandas ExtensionArray in Table.to_pandas conversion
  • ARROW-2602 - [Packaging] Automate build of development docker containers
  • ARROW-2863 - [Python] Add context manager APIs to RecordBatch*Writer/Reader classes
  • ARROW-3085 - [Rust] Add an adapter for parquet.
  • ARROW-3408 - [C++] Add CSV option to automatically attempt dict encoding
  • ARROW-3444 - [Python] Add Array/ChunkedArray/Table nbytes attribute
  • ARROW-3706 - [Rust] Add record batch reader trait.
  • ARROW-3789 - [Python] Use common conversion path for Arrow to pandas.Series/DataFrame. Zero copy optimizations for DataFrame, add split_blocks and self_destruct options
  • ARROW-3808 - [R] Array extract, including Take method
  • ARROW-3813 - [R] lower level construction of Dictionary Arrays
  • ARROW-4059 - [Rust] Parquet/Arrow Integration
  • ARROW-4091 - [C++] Curate default list of CSV null spellings
  • ARROW-4208 - [CI/Python] Have automatized tests for S3
  • ARROW-4219 - [Rust][Parquet] Initial support for arrow reader.
  • ARROW-4223 - [Python] Support scipy.sparse integration
  • ARROW-4224 - [Python] Support integration with pydata/sparse library
  • ARROW-4225 - [Format][C++] Add CSC sparse matrix support
  • ARROW-4722 - [C++] Implement Bitmap class to modularize handling of bitmaps
  • ARROW-4748 - [Rust][DataFusion] Optimize GROUP BY aggregate queries
  • ARROW-4930 - [C++] Improve find_package() support
  • ARROW-5180 - [Rust] IPC Support
  • ARROW-5181 - [Rust] Initial support for Arrow File reader
  • ARROW-5182 - [Rust] Arrow IPC file writer
  • ARROW-5227 - [Rust] [DataFusion] Re-implement query execution with an extensible physical query plan
  • ARROW-5277 - [C#] MemoryAllocator.Allocate(length: 0) doesn't return null
  • ARROW-5333 - [C++] Clamp build option summary width to 90
  • ARROW-5366 - [Rust] Duration and Interval Arrays
  • ARROW-5400 - [Rust] Test/ensure that reader and writer support zero-length record batches
  • ARROW-5445 - [Website] Remove language that encourages pinning a version
  • ARROW-5454 - [C++] Implement Take on ChunkedArray for DataFrame use
  • ARROW-5502 - [R] file readers should mmap
  • ARROW-5508 - [C++] Create reusable Iterator<T> interface
  • ARROW-5523 - [Python][Packaging] Use HTTPS consistently for downloading wheel dependencies
  • ARROW-5712 - [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly
  • ARROW-5767 - [Format] Permit dictionary replacements in IPC protocol
  • ARROW-5801 - [CI] Dockerize (add to docker-compose) all Travis CI Linux tasks
  • ARROW-5802 - [CI][Archery] Dockerify lint utilities
  • ARROW-5804 - [C++] Dockerize C++ CI job with conda-forge toolchain, code coverage from Travis CI
  • ARROW-5805 - [Python] Dockerize (add to docker-compose) Python Travis CI job
  • ARROW-5806 - [CI] Dockerize (add to docker-compose) Integration tests Travis CI entry
  • ARROW-5807 - [JS] Dockerize NodeJS Travis CI entry
  • ARROW-5808 - [GLib][Ruby] Dockerize (add to docker-compose) current GLib + Ruby Travis CI entry
  • ARROW-5809 - [CI][Rust] Travis runs dockerized Rust build
  • ARROW-5810 - [Go] Dockerize Travis CI Go build
  • ARROW-5831 - [Release] Add Python program to download binary artifacts in parallel, allow abort/resume
  • ARROW-5839 - [Python] Test manylinux2010 in CI
  • ARROW-5855 - [Python] Support for Duration (timedelta) type
  • ARROW-5859 - [Python] Support ExtensionArray.to_numpy using storage array
  • ARROW-5971 - [Website] Blog post introducing Arrow Flight
  • ARROW-5994 - [CI] [Rust] Create nightly releases of the Rust implementation
  • ARROW-6003 - [C++] Better input validation and error messaging in CSV reader
  • ARROW-6074 - [FlightRPC][Java] Middleware
  • ARROW-6091 - [Rust][DataFusion] Implement physical execution plan for LIMIT
  • ARROW-6109 - [Integration] Docker image for integration testing can't be built on windows
  • ARROW-6112 - [Java] Support int64 buffer lengths in Java
  • ARROW-6184 - [Java] Provide hash table based dictionary encoder
  • ARROW-6251 - [Developer] Add PR merge tool to apache/arrow-site
  • ARROW-6257 - [C++] Add fnmatch compatible globbing function
  • ARROW-6274 - [Rust][DataFusion] Add support for writing results to CSV
  • ARROW-6277 - [C++][Parquet] Support direct DictionaryArray write of all parquet types
  • ARROW-6283 - [Rust][DataFusion] Implement Context::write_csv to write partitioned CSV results
  • ARROW-6285 - [GLib] Add support for LargeBinary and LargeString types
  • ARROW-6286 - [GLib] Add support for LargeList type
  • ARROW-6299 - [C++] Simplify FileFormat classes to singletons
  • ARROW-6321 - [Python] Ability to create ExtensionBlock on conversion to pandas
  • ARROW-6340 - [R] Implements low-level bindings to Dataset classes
  • ARROW-6341 - [Python] Implement low-level bindings for Dataset
  • ARROW-6352 - [Java] Add implementation of DenseUnionVector
  • ARROW-6367 - [C++][Gandiva] Implement string reverse
  • ARROW-6378 - [C++][Dataset] Implement recursive TreeDataSource
  • ARROW-6386 - [C++][Documentation] Explicit documentation of null slot interpretation
  • ARROW-6394 - [Java] Support conversions between delta vector and partial sum vector
  • ARROW-6396 - [C++] Add overloads of Boolean kernels implementing Kleene logic
  • ARROW-6398 - [C++] Consolidate ScanOptions and ScanContext
  • ARROW-6405 - [Python] Add std::move wrapper for use in Cython
  • ARROW-6452 - [Java] Override ValueVector toString() method
  • ARROW-6463 - [C++][Python] Rename arrow::fs::Selector to FileSelector
  • ARROW-6466 - [Integration][CI] Move integration test code to archery integration command. Dockerize integration tests
  • ARROW-6468 - [C++] Remove unused hashing routines
  • ARROW-6473 - Dictionary encoding format clarifications/future proofing
  • ARROW-6503 - [C++] Add an argument of memory pool object to SparseTensorConverter
  • ARROW-6508 - [C++] Add Tensor and SparseTensor factory function with validations
  • ARROW-6515 - [C++] Clean type_traits.h definitions
  • ARROW-6578 - [C++] Allow casting number to string
  • ARROW-6592 - [Java] Add support for skipping decoding of columns/field in Avro converter
  • ARROW-6594 - [Java] Support logical type encodings from Avro
  • ARROW-6598 - [Java] Sort the code for ApproxEqualsVisitor and provide an interface for custom vector equality
  • ARROW-6608 - [C++] Make default for ARROW_HDFS to be OFF
  • ARROW-6610 - [C++] Add cmake option to disable filesystem layer
  • ARROW-6611 - [C++] Make ARROW_JSON=OFF the default
  • ARROW-6612 - [C++] Add ARROW_CSV CMake build flag
  • ARROW-6619 - [Ruby] Add support for building Gandiva::Expression by Arrow::Schema#build_expression
  • ARROW-6624 - [C++][Python] Add SparseTensor.ToTensor() method
  • ARROW-6625 - [C++][Python] Allow concat_tables to null fill missing columns
  • ARROW-6631 - [C++] Do not build any compression libraries by default in C++ build
  • ARROW-6632 - [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default
  • ARROW-6633 - [C++] Vendor double-conversion library
  • ARROW-6634 - [C++][FOLLOWUP] Remove Flatbuffers EP remnants from C++ Dockerfiles
  • ARROW-6634 - [C++] Vendor Flatbuffers and check in compiled sources
  • ARROW-6635 - [C++] Disable glog integration by default
  • ARROW-6636 - [C++] Do not build command line tools by default
  • ARROW-6637 - [Packaging][FOLLOWUP] Enable necessary components in Autobrew build for R
  • ARROW-6637 - [C++] Further streamline default build, add ARROW_CSV CMake option
  • ARROW-6646 - [Go] Write no IPC buffer metadata for NullType
  • ARROW-6650 - [Rust][Integration] Compare integration JSON with schema & batch
  • ARROW-6656 - [Rust][Datafusion] Add MAX, MIN expressions
  • ARROW-6657 - [Rust][DataFusion] Add Count Aggregate Expression
  • ARROW-6658 - [Rust][Datafusion] Implement AVG expression
  • ARROW-6659 - [Rust][DataFusion] Refactor of HashAggregateExec to support custom merge
  • ARROW-6662 - [Java] Implement equals/approxEquals API for VectorSchemaRoot
  • ARROW-6671 - [C++][Python] Use more consistent names for sparse tensor items
  • ARROW-6672 - [Java] Extract a common interface for dictionary builders
  • ARROW-6685 - [C++] Ignore trailing slashes in S3FS
  • ARROW-6686 - [CI] Pull and push docker images to speed up the nightly builds
  • ARROW-6688 - [Packaging] Include s3 support in the conda packages
  • ARROW-6690 - [Rust][DataFusion] Optimize aggregates without GROUP BY to use SIMD
  • ARROW-6692 - [Rust][DataFusion] Update examples to use physical query plan
  • ARROW-6693 - [Rust] [DataFusion] Update unit tests to use physical query plan
  • ARROW-6694 - [Rust][DataFusion] Integration tests now use physical query plan
  • ARROW-6695 - [Rust][DataFusion] Remove legacy code for executing logical plan
  • ARROW-6696 - [Rust][DataFusion] Implement simple math operations in physical query plan
  • ARROW-6700 - [Rust][DataFusion] Use new Arrow Parquet reader
  • ARROW-6707 - [Java] Improve the performance of JDBC adapters by using nullable information
  • ARROW-6710 - [Java] Add JDBC adapter test to cover cases which contains some null values
  • ARROW-6711 - [C++] Consolidate Filter and Expression
  • ARROW-6721 - [JAVA] Avro adapter benchmark only runs once in JMH
  • ARROW-6722 - [Java] Provide a uniform way to get vector name
  • ARROW-6729 - [C++] Prevent data copying in StlStringBuffer
  • ARROW-6730 - [CI] Use GitHub Actions for "C++ with clang 7" docker image
  • ARROW-6731 - [CI] [Rust] Set up Github Action to run Rust tests
  • ARROW-6732 - [Java] Implement quick sort in a non-recursive way to avoid stack overflow
  • ARROW-6741 - [Release] Update changelog.py to use APACHE_ prefixed JIRA_USERNAME and JIRA_PASSWORD environment variables
  • ARROW-6742 - [C++] Remove boost::filesystem dependency in hdfs_internal.cc
  • ARROW-6743 - [C++] Remove usage of boost::filesystem
  • ARROW-6744 - [Rust] Publicly expose JsonEqual
  • ARROW-6754 - [C++] Merge allocator.h into stl.h
  • ARROW-6758 - [Developer] Install local NodeJS via nvm when running release verification
  • ARROW-6764 - [C++] Create a readahead iterator
  • ARROW-6767 - [JS] Lazily bind batches in scan/scanReverse
  • ARROW-6768 - [C++][Dataset] Add method to convert from Scanner to Table
  • ARROW-6769 - [Dataset][C++] End to end test
  • ARROW-6770 - [CI][Travis] Download Minio quietly
  • ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
  • ARROW-6778 - [C++] Support cast for DurationType
  • ARROW-6782 - [C++] Do not require Boost for minimal C++ build
  • ARROW-6784 - [C++][R] Move filter and take for ChunkedArray, RecordBatch, and Table from Rcpp to C++ library
  • ARROW-6787 - [CI][C++] Decommission "C++ with clang 7 and system packages" Travis CI job
  • ARROW-6788 - [CI][Dev] Exercise merge script tests
  • ARROW-6789 - [Python] Improve ergonomics by automatically boxing Action and Result in do_action RPC
  • ARROW-6790 - [Release] Enable selected integration tests in release verification
  • ARROW-6793 - [R] Arrow C++ binary packaging for Linux
  • ARROW-6797 - [Release] Use a separately cloned arrow-site repository in the website post release script
  • ARROW-6802 - [Packaging][deb][RPM] Update qemu-user-static package URL
  • ARROW-6803 - [Rust][DataFusion] Performance optimization for single partition aggregate queries
  • ARROW-6804 - [CI][Rust] Migrate Travis job to Github Actions
  • ARROW-6807 - [Java][FlightRPC] Expose gRPC service & client
  • ARROW-6810 - [Website] Add docs for R package 0.15 release
  • ARROW-6811 - [R] Assorted post-0.15 release cleanups
  • ARROW-6814 - [C++] Resolve compiler warnings occurred on release build
  • ARROW-6822 - [Website] merge_pr.py is published
  • ARROW-6824 - [Plasma] Allow creation of multiple objects through a single IPC in Plasma Store
  • ARROW-6825 - [C++] Rework CSV reader IO around readahead iterator
  • ARROW-6831 - [R] Update R macOS/Windows builds for change in cmake compression defaults
  • ARROW-6832 - [R] Implement Codec::IsAvailable
  • ARROW-6833 - [R][CI] Add crossbow job for full R autobrew macOS build
  • ARROW-6836 - [Format][KeyValue] field to the Footer table in File.fbs
  • ARROW-6843 - [Website] Disable deploy on pull request
  • ARROW-6847 - [C++] Add range_expression adapter to Iterator
  • ARROW-6850 - [Java] Jdbc converter support Null type
  • ARROW-6852 - [C++] Fix build issue on memory-benchmark
  • ARROW-6853 - [Java] Support vector and dictionary encoder use different hasher for calculating hashCode
  • ARROW-6855 - [FlightRPC][C++][Python] Flight middleware for C++/Python
  • ARROW-6862 - [Developer] Check pull request title
  • ARROW-6863 - [Java] Provide parallel searcher
  • ARROW-6865 - [Java] Improve the performance of comparing an ArrowBuf against a byte array
  • ARROW-6866 - [Java] Improve the performance of calculating hash code for struct vector
  • ARROW-6879 - [Rust] Add explicit SIMD for sum kernel
  • ARROW-6880 - [Rust] Add explicit SIMD for min/max kernel
  • ARROW-6881 - [Rust] Remove "array_ops" in favor of the "compute" sub-module
  • ARROW-6884 - [Python] Format friendlier message in Python when a server-side RPC handler fails
  • ARROW-6887 - [Java] Create prose documentation for using ValueVectors
  • ARROW-6888 - [Java] Support copy operation for vector value comparators
  • ARROW-6889 - [Java] ComplexCopier enable FixedSizeList type & fix RangeEqualsVisitor StackOverFlow
  • ARROW-6891 - [Rust][Parquet] utf8 support for arrow reader.
  • ARROW-6902 - [C++][Compute] Add String/Binary support to Compare kernel
  • ARROW-6904 - [Python] Add support for MapArray
  • ARROW-6907 - [Plasma] Allow Plasma to send batched notifications.
  • ARROW-6911 - [Java] Provide composite comparator
  • ARROW-6912 - [Java] Extract a common base class for avro converter consumers
  • ARROW-6916 - [Developer] Sort tasks by name in Crossbow e-mail report
  • ARROW-6918 - [R] Make docker-compose setup faster
  • ARROW-6919 - [Python] Expose more builders in Cython
  • ARROW-6920 - [Packaging] Build python 3.8 wheels
  • ARROW-6926 - [Python] Support sizeof protocol for Python objects
  • ARROW-6927 - [C++] Add gRPC version check
  • ARROW-6928 - [Rust] Add support for FixedSizeListArray
  • ARROW-6930 - [Java] Create utility class for populating vector values used for test purpose only
  • ARROW-6932 - [JAVA] incorrect log on known extension type
  • ARROW-6933 - [Java] Suppor linear dictionary encoder
  • ARROW-6936 - [Python] Improve error message when unwrapping object fails
  • ARROW-6942 - [Developer] Add support for Parquet in pull request check by GitHub Actions
  • ARROW-6943 - [Website] Translate Apache Arrow Flight introduction to Japanese
  • ARROW-6944 - [Rust] Add String, FixedSizeBinary types
  • ARROW-6949 - [Java] Fix promotable writer to handle nullvectors
  • ARROW-6951 - [C++][Dataset] Column projection in ParquetFragment
  • ARROW-6952 - [C++][Dataset] Implement predicate pushdown with ParqueFileFragment
  • ARROW-6954 - [Python][CI] Add Python 3.8 to CI matrix
  • ARROW-6960 - [R] Add lz4 and zstd to R PKGBUILD
  • ARROW-6961 - [C++][Gandiva] Add string lower function in Gandiva
  • ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds
  • ARROW-6964 - [C++][Dataset] Add multithread support to Scanner::ToTable
  • ARROW-6965 - [C++][Dataset] Optionally expose partition keys as columns
  • ARROW-6967 - [C++][Dataset] IN, IS_VALID filter expressions
  • ARROW-6969 - [C++][Dataset] ParquetScanTask defer memory usage
  • ARROW-6970 - [Packaging][RPM] Add support for CentOS 8
  • ARROW-6973 - [C++][ThreadPool] Use perfect forwarding in Submit
  • ARROW-6975 - [C++] Put make_unique in its own header
  • ARROW-6980 - [R] dplyr backend for RecordBatch/Table
  • ARROW-6984 - [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
  • ARROW-6986 - [R] Add basic Expression class
  • ARROW-6987 - [CI] Travis OSX failing to install sdk headers
  • ARROW-6991 - [Packaging][deb] Add support for Ubuntu 19.10
  • ARROW-6994 - [C++] Fix aggressive RSS inflation on macOS when jemalloc background_thread is not enabled
  • ARROW-6997 - [Packaging][RPM] Add apache-arrow-release
  • ARROW-7000 - [C++][Gandiva] Handle empty inputs in string upper, lower functions
  • ARROW-7003 - [Rust] Generate flatbuffers files in docker build image
  • ARROW-7004 - [Plasma] Make it possible to bump up object in LRU cache
  • ARROW-7006 - [Rust] Bump flatbuffers version to avoid vulnerability
  • ARROW-7007 - [C++] Add use_mmap option to LocalFS
  • ARROW-7014 - [Developer][Release] Add "wheels" verification option to verify-release-candidate.sh for Linux and macOS
  • ARROW-7015 - [Developer] Write script to verify macOS wheels given local environment with conda or virtualenv
  • ARROW-7016 - [Developer][Python] Add Windows batch script to test Python wheels for release candidate
  • ARROW-7019 - [Java] Improve the performance of loading validity buffers
  • ARROW-7026 - [Java] Remove assertions in MessageSerializer/vector/writer/reader
  • ARROW-7031 - [Python] Correct LargeListArray.offsets attribute
  • ARROW-7031 - [Python] Expose the offsets of a ListArray in python
  • ARROW-7032 - [Release] Run the python unit tests in the release verification script
  • ARROW-7034 - [CI][Crossbow] Skip known nightly failures
  • ARROW-7035 - [R] Default arguments are unclear in write_parquet docs
  • ARROW-7036 - [C++] Version up ORC to avoid compile errors
  • ARROW-7037 - [C++ ] Compile error on the combination of protobuf >= 3.9 and clang
  • ARROW-7039 - [Python] Fix pa.table/record_batch typecheck to work without pandas
  • ARROW-7047 - [C++] Insert implicit casts in ScannerBuilder::Finish
  • ARROW-7052 - [C++] Fix linking of datasets example when ARROW_BUILD_SHARED=OFF
  • ARROW-7054 - [Docs] Enable overriding project version with environment variable when building Sphinx docs
  • ARROW-7057 - [C++] Add API to parse URI query strings
  • ARROW-7058 - [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to its base dir
  • ARROW-7060 - [R] Post-0.15.1 cleanup
  • ARROW-7061 - [C++][Dataset] Add ignore file options to FileSystemDataSourceDiscovery
  • ARROW-7062 - [C++][Dataset] Ensure ParquetFileFormat::Open catch parqu…
  • ARROW-7064 - [R] Support null type using vctrs::unspecified()
  • ARROW-7066 - [Python] Allow returning ChunkedArray in arrow_array
  • ARROW-7067 - [CI] Disable code coverage on Travis-CI
  • ARROW-7069 - [C++][Dataset] Replace ConstantPartitionScheme with PrefixDictionaryPartitionScheme
  • ARROW-7070 - [Packaging][deb] Update package names for 1.0.0
  • ARROW-7072 - [Java] Support concating validity bits efficiently
  • ARROW-7082 - [Packaging][deb] Add apache-arrow-archive-keyring package
  • ARROW-7086 - [C++] Provide a wrapper for invoking factories to produce a Result
  • ARROW-7092 - [R] Add vignette for dplyr and datasets
  • ARROW-7093 - [R] Support creating ScalarExpressions for more data types
  • ARROW-7094 - [C++] FileSystemDataSource should use an owning pointer for fs::Filesystem
  • ARROW-7095 - [R] Require an explicit call to pull Datasets into memory
  • ARROW-7096 - [C++] Unified ConcatenateTables APIs
  • ARROW-7098 - [Java] Improve the performance of comparing two memory blocks
  • ARROW-7099 - [C++] Disambiguate function calls in csv parser test
  • ARROW-7101 - [CI] Refactor docker-compose setup and use it with GitHub Actions
  • ARROW-7103 - [R] Various minor cleanups
  • ARROW-7107 - [C++][MinGW] Enable Flight on AppVeyor
  • ARROW-7110 - [GLib] Add filter support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7111 - [GLib] Add take support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7113 - [Rust] Add unowned buffer.
  • ARROW-7116 - [CI] Use the docker repository provided by apache organization
  • ARROW-7120 - [C++][CI] Add .ccache to the docker-compose volume mounts
  • ARROW-7146 - [R][CI] Various fixes and speedups for the R docker-compose setup
  • ARROW-7147 - [C++][Dataset] Refactor dataset's API to use Result<T>
  • ARROW-7148 - [C++][Dataset] Major API cleanup
  • ARROW-7149 - [C++] Remove experimental status on filesystem APIs
  • ARROW-7155 - [Java][CI] add maven wrapper to make setup process simple
  • ARROW-7159 - [CI] Run HDFS tests as cron task
  • ARROW-7160 - [C++] Update string_view backport
  • ARROW-7161 - [C++] Migrate filesystem APIs from Status to Result
  • ARROW-7162 - [C++] Cleanup warnings in cmake_modules/SetupCxxFlags.cmake
  • ARROW-7166 - [Java] Remove redundant code for Jdbc adapters
  • ARROW-7169 - [C++] Vendor uriparser library
  • ARROW-7171 - [Ruby] Pass Array<Boolean> for Arrow::Table#filter
  • ARROW-7172 - [C++][Dataset] Improve format of Expression::ToString
  • ARROW-7176 - [C++] Fix arrow::ipc compiler warning
  • ARROW-7178 - [C++] Vendor forward compatible std::optional
  • ARROW-7185 - [R][Dataset] Add bindings for IN, IS_VALID expressions
  • ARROW-7186 - [R] Add inline comments to document the dplyr code
  • ARROW-7192 - [Rust] Implement Flight crate
  • ARROW-7193 - [Rust] Arrow stream reader
  • ARROW-7195 - [Ruby] Improve #filter, #take, and #is_in
  • ARROW-7196 - [Ruby] Remove needless BinaryArrayBuilder#append_values
  • ARROW-7197 - [Ruby] Suppress keyword argument related warnings with Ruby 2.7
  • ARROW-7204 - [C++][Dataset] Implicit cast support for InExpression
  • ARROW-7206 - [Java] Avoid string concatenation when calling Preconditions#checkArgument
  • ARROW-7207 - [Rust] Update generated fbs files
  • ARROW-7210 - [C++][R] Allow Numeric <-> Temporal Scalar casts
  • ARROW-7211 - [Rust] Support byte buffers as a parquet sink
  • ARROW-7216 - [Java] Improve the performance of setting/clearing individual bits
  • ARROW-7219 - [Python][CI] Test with pickle5 installed
  • ARROW-7227 - [Python] Added a python wrapper for ConcatenateTablesWithPromotions
  • ARROW-7228 - [Python] Added a python wrapper for RecordBatch.FromStructArray()
  • ARROW-7235 - [C++] Add Result<T> APIs to IO layer
  • ARROW-7236 - [C++] Add Result<T> APIs to arrow/csv
  • ARROW-7240 - [C++] Add Result<T> to APIs to arrow/util
  • ARROW-7246 - [CI][Python] Use Python 3 for docker-compose
  • ARROW-7247 - [CI][Python] Fix wheel build error on macOS
  • ARROW-7248 - [Rust] Automatically Generate IPC Messages
  • ARROW-7255 - [CI] Re-enable source release test on pull request
  • ARROW-7257 - [CI] Fix Homebrew formula audit error by openssl
  • ARROW-7258 - [CI] Fix fuzzit build directory
  • ARROW-7259 - [Java] Support subfield encoder use different hasher
  • ARROW-7260 - [CI] Remove Ubuntu 14.04 test job
  • ARROW-7261 - [Python] Add Python support for Fixed Size List type
  • ARROW-7262 - [C++][Gandiva] Added replace function
  • ARROW-7263 - [C++][Gandiva] Implemented locate function
  • ARROW-7268 - [Rust] Add custom_metadata field from IPC message to Schema.
  • ARROW-7269 - [Python] Add ORC to api documentation
  • ARROW-7270 - [Go] preserve CSV reading behaviour, improve memory usage
  • ARROW-7274 - [C++] Add Result<T> APIs to Decimal class
  • ARROW-7275 - [Ruby] Add support for Arrow::ListDataType.new(data_type)
  • ARROW-7276 - [Ruby][...]
  • ARROW-7277 - [Java][Doc] Add discussion about vector lifecycle
  • ARROW-7279 - [C++] Rename UnionArray::type_ids to type_codes
  • ARROW-7284 - [Java] ensure java implementation meets clarified dictionary spec
  • ARROW-7289 - [C#] ListType constructor argument is redundant
  • ARROW-7290 - [C#] Implement ListArray Builder
  • ARROW-7292 - [CI][C++] Add ASAN / UBSAN run
  • ARROW-7293 - [Dev][C++] Persist ccache in docker-compose build volumes
  • ARROW-7296 - [Python] Add ORC api documentation
  • ARROW-7299 - [GLib] Use Result instead of Status
  • ARROW-7303 - [C++] Refactor CSV benchmarks to use Result APIs
  • ARROW-7306 - [C++] Add Result-returning version of FileSystemFromUri
  • ARROW-7307 - [CI][GLib] Ensure generating documentation
  • ARROW-7309 - [Python] Support HDFS federation viewfs
  • ARROW-7310 - [Python] Expose HDFS implementation for pyarrow.fs
  • ARROW-7311 - [Python] Return filesystem and path from URI
  • ARROW-7312 - [Rust] Implement std::error::Error for ArrowError.
  • ARROW-7317 - [C++] Migrate Iterator to a Result API
  • ARROW-7319 - [C++] Refactor Iterator<T> to yield Result<T>
  • ARROW-7321 - [CI][GLib] Disable development mode
  • ARROW-7322 - [CI][Python] Fall back to arrowdev dockerhub organization for manylinux images
  • ARROW-7323 - [CI][Rust] Use the same toolchain
  • ARROW-7324 - [Rust] Add timezone to timestamp
  • ARROW-7325 - [Rust][Parquet] Update to parquet-format 2.6 and thrift 0.12
  • ARROW-7329 - [Java] AllocationManager: Allow managing different types …
  • ARROW-7333 - [CI][Rust] Remove duplicated nightly job
  • ARROW-7334 - [CI][Python] Use Python 3 on macOS
  • ARROW-7339 - [CMake] Thrift version not respected in CMake configuration version.txt
  • ARROW-7340 - [CI] Prune defunct appveyor build setup
  • ARROW-7344 - [Packaging][Python] Build manylinux2014 wheels
  • ARROW-7346 - [CI] Explicit usage of ccache across the builds
  • ARROW-7347 - [C++] Update bundled Boost to 1.71.0
  • ARROW-7348 - [Rust] Add api to return null bitmap buffer.
  • ARROW-7351 - [Developer] Only suggest cpp-* versions by default for PARQUET issues in merge tool
  • ARROW-7357 - [Go] migrate to x/xerrors
  • ARROW-7366 - [C++][Dataset] Use PartitionSchemeDiscovery in DataSourceDiscovery
  • ARROW-7367 - [Python] Use np.full instead of np.array.repeat in ParquetDatasetPiece
  • ARROW-7368 - [Ruby] Use :arrow_file and :arrow_streaming for format name
  • ARROW-7369 - [GLib] Add garrow_table_combine_chunks
  • ARROW-7370 - [C++] Fix old Protobuf with AUTO detection failure
  • ARROW-7377 - [C++][Dataset] Add ScanOptions::MaterializedFields
  • ARROW-7378 - [C++][Gandiva] Fix loop vectorization in gandiva
  • ARROW-7379 - [C++] Introduce SchemaBuilder companion class and Field::IsCompatibleWith
  • ARROW-7380 - [C++][Dataset] Implement DatasetFactory
  • ARROW-7382 - [C++][Dataset] Insert missing directories in FileSystemDataSourceDiscovery::Make
  • ARROW-7387 - [C#] Support ListType Serialization
  • ARROW-7392 - [Packaging] Add conda packaging tasks for python 3.8
  • ARROW-7398 - [Packaging][Python] Conda builds are failing on macOS
  • ARROW-7399 - [C++][Gandiva] set Mcpu based on host cpu
  • ARROW-7402 - [C++] Add more information on CUDA error
  • ARROW-7403 - [C++][JSON] Enable Rapidjson on Arm64 Neon
  • ARROW-7410 - [Doc][Python] Document filesystem API
  • ARROW-7411 - [C++][Flight] Improve the output of Arrow Flight benchmark
  • ARROW-7413 - [Python] Expose and test the partioning discovery
  • ARROW-7414 - [R][Dataset] Implement *PartitionSchemeDiscovery in R
  • ARROW-7415 - [C++][Dataset] implement IpcFormat
  • ARROW-7416 - [R][Nightly] Fix macos-r-autobrew build on R 3.6.2
  • ARROW-7417 - [C++] Add a docker-compose entry for CUDA 10.1
  • ARROW-7418 - [C++] Fix build error on Ubuntu 16.04
  • ARROW-7420 - [C++] Migrate tensor related APIs to Result-returning version
  • ARROW-7429 - [Java] Enhance code style checking for Java code (remove consecutive spaces)
  • ARROW-7430 - [Python] Add more docstrings to dataset bindings
  • ARROW-7431 - [Python] Add dataset API to reference docs
  • ARROW-7432 - [Python] Add higher level open_dataset function
  • ARROW-7439 - [C++][Dataset] Remove pointer aliases
  • ARROW-7449 - [GLib] Make GObject Introspection optional
  • ARROW-7452 - [GLib] Make GArrowTimeDataType abstract
  • ARROW-7453 - [Ruby]
  • ARROW-7454 - [Ruby] Add support for saving/loading TSV
  • ARROW-7455 - [Ruby] Use Arrow::DataType.resolve for all GArrowDataType input
  • ARROW-7456 - [C++] Add support for YYYY-MM-DDThh and YYYY-MM-DDThh:mm timestamp formats
  • ARROW-7457 - [Doc] fix typos
  • ARROW-7459 - [Python] Fix document lint error
  • ARROW-7460 - [Rust] Improve some kernel performance
  • ARROW-7461 - [Java] fix typos
  • ARROW-7463 - [Doc] fix a broken link and typo
  • ARROW-7464 - [C++] Refine CpuInfo singleton with std::call_once
  • ARROW-7465 - [C++] Add Arrow memory benchmark for Arm64
  • ARROW-7468 - [Python] fix typos
  • ARROW-7469 - [C++] Improve division related bit operations
  • ARROW-7470 - [JS] fix typos
  • ARROW-7474 - [Ruby] Improve CSV save performance
  • ARROW-7475 - [Rust] Arrow IPC Stream writer
  • ARROW-7477 - [Java][FlightRPC] set up gRPC reflection metadata
  • ARROW-7479 - [Rust][Ruby][R] Fix typos
  • ARROW-7481 - [C#] fix typo
  • ARROW-7482 - [C++] Fix typos
  • ARROW-7484 - [C++][Gandiva] Fix typos
  • ARROW-7485 - [C++][Prasma] Fix typos
  • ARROW-7487 - [Developer] Fix typos
  • ARROW-7488 - [GLib] Fix typos and broken links
  • ARROW-7489 - [CI] Fix typos
  • ARROW-7490 - [Java] Avro converter should convert attributes and props to FieldType metadata
  • ARROW-7493 - [Python] Expose sum kernel in pyarrow.compute and support ChunkedArray inputs
  • ARROW-7498 - [Dataset] Rename core classes before stable API
  • ARROW-7502 - [Integration] Remove Spark patch not needed
  • ARROW-7513 - [JS][tutorial] - Rich cols part 1
  • ARROW-7514 - [C#] Make GetValueOffset Obsolete
  • ARROW-7519 - [Python] Build wheels, conda packages with dataset support
  • ARROW-7521 - [Rust] Remove tuple on FixedSizeList
  • ARROW-7523 - [Developer] Relax clang-tidy check
  • ARROW-7526 - [C++][Compute] Optimize small integer sorting
  • ARROW-7532 - [CI] Unskip brew test after Homebrew fixes it upstream
  • ARROW-7537 - [CI][R] Nightly macOS autobrew job should be more verbose if it fails
  • ARROW-7538 - [Java] Clarify actual and desired size in AllocationManager
  • ARROW-7540 - [C++] Install license files and README
  • ARROW-7541 - [GLib] Install license files
  • ARROW-7542 - [CI][C++] Use $(sysctl -n hw.ncpu) instead of $(nproc) on macOS
  • ARROW-7549 - [Java] Reorganize Flight modules to keep top level clean/organized
  • ARROW-7550 - [R][CI] Run donttest examples in CI
  • ARROW-7557 - [C++][Compute] Validate sorting stability
  • ARROW-7558 - [Packaging][deb][RPM] Use the host owner and group for artifacts
  • ARROW-7560 - [Rust] Reduce Rc/Refcell usage
  • ARROW-7565 - [Website] Add support for download URL redirect
  • ARROW-7566 - [CI] Use more recent Miniconda on AppVeyor
  • ARROW-7567 - [Java] Fix races in checkstyle upgdae
  • ARROW-7567 - [Java] Bump Checkstyle from 6.19 to 8.19
  • ARROW-7568 - [Java] Bump Apache Avro from 1.9.0 to 1.9.1
  • ARROW-7569 - [Python] Add API to map Arrow types to pandas ExtensionDtypes in to_pandas conversions
  • ARROW-7570 - [Java] Fix high severity issues
  • ARROW-7571 - [Java] Correct minimal Java version on README
  • ARROW-7572 - [Java] Enforce Maven 3.3+ as mentioned in README
  • ARROW-7573 - [Rust] Reduce boxing and cleanup
  • ARROW-7575 - [R] Linux binary packaging followup
  • ARROW-7576 - [C++][Dev] Improve fuzzing setup
  • ARROW-7577 - [CI][C++] Check OSS-Fuzz build in Github Actions
  • ARROW-7578 - [R] Add support for datasets with IPC files and with multiple sources
  • ARROW-7580 - [Website] 0.16 release post
  • ARROW-7581 - [R] Documentation/polishing for 0.16 release
  • ARROW-7590 - [C++] Don't ignore managed files in thirdparty
  • ARROW-7597 - [C++] More compact CMake configuration summary
  • ARROW-7600 - [C++][Parquet] failing disabled unittest for nested parquet.
  • ARROW-7601 - [Doc][C++] Update fuzzing doc
  • ARROW-7602 - [Archery] Add more archery build options
  • ARROW-7613 - [Rust] Remove redundant :: prefixes
  • ARROW-7622 - [Format] Mark Tensor and SparseTensor fields required
  • ARROW-7623 - [C++] Update generated flatbuffers code
  • ARROW-7626 - [Parquet][GLib] Add support for version macros
  • ARROW-7627 - [C++][Gandiva] Optimize string truncate function
  • ARROW-7629 - [C++][CI] Add fuzz regression files to arrow-testing
  • ARROW-7630 - [C++][CI] Check fuzz crash regressions in CI
  • ARROW-7632 - [C++][CI] Add extension type data to IPC fuzz seed corpus
  • ARROW-7635 - [C++] Add pkg-config support for each components
  • ARROW-7636 - [Python] Clean-up the pyarrow.dataset.partitioning() API
  • ARROW-7644 - Add vcpkg installation instructions
  • ARROW-7645 - [Packaging][deb][RPM] Fix arm64 packaging build
  • ARROW-7648 - [C++] Sanitize local paths on Windows
  • ARROW-7658 - [R] Support dplyr filtering on date/time
  • ARROW-7659 - [Rust] Reduce Rc usage
  • ARROW-7660 - [C++][Gandiva] Optimise castVarchar(string, int) function for single byte characters
  • ARROW-7665 - [R] Build in parallel in linuxLibs.R
  • ARROW-7666 - [Packaging][deb] Always use Ninja to reduce build time
  • ARROW-7667 - [Packaging][deb] Add ubuntu-eoan to nightly jobs
  • ARROW-7668 - [Packaging][RPM] Use Ninja if possible to reduce build time
  • ARROW-7670 - [Python][Dataset] More ergonomical API
  • ARROW-7671 - [Python][Dataset] Add bindings for the DatasetFactory
  • ARROW-7674 - [Dev] Add helpful message for captcha challenge in merge_arrow_pr.py
  • ARROW-7682 - [Packaging] Add support for arm64 APT/Yum repositories
  • ARROW-7683 - [Packaging] Set 0.16.0 as the next version
  • ARROW-7686 - [Packaging][deb][RPM] Include more arrow-*.pc
  • ARROW-7687 - [C++] Fix dead links in README
  • ARROW-7692 - [Rust] Simplify some Option / Result pattern matches
  • ARROW-7694 - [Packaging][deb][RPM] Add support for RC to repository packages
  • ARROW-7695 - [Release] Update java versions to 0.16-SNAPSHOT
  • ARROW-7696 - [Release] Add support for running unit test on release branch
  • ARROW-7697 - [Release] Add a test for updating Linux packages by 00-prepare.sh
  • ARROW-7710 - [Release][C#] Add support for redirecting .NET download URL
  • ARROW-7711 - [C#] Make Date32 test independent of system timezone
  • ARROW-7715 - [Release][APT] Ignore some arm64 verifications
  • ARROW-7716 - [Packaging][APT] Use the "main" component for Ubuntu 19.10
  • ARROW-7719 - [Python][Dataset] Table equality check occasionally fails
  • ARROW-7724 - [Release][Yum] Ignore some arm64 verifications
  • ARROW-7743 - [Rust] [Parquet] Support reading timestamp micros
  • ARROW-7768 - [Rust] Implement Length and TryClone traits for Cursor<Vec<u8>> in reader.rs
  • ARROW-8015 - [Python] Build 0.16.0 wheel install for Windows + Python 3.5 and publish to PyPI
  • PARQUET-517 - [C++] Use arrow::MemoryPool for all heap allocations
  • PARQUET-1300 - [C++] Implement encrypted Parquet read and write support
  • PARQUET-1664 - [C++] Provide API to return metadata string from FileMetadata.
  • PARQUET-1678 - [C++] Provide classes for reading/writing using input/output operators
  • PARQUET-1688 - [C++] StreamWriter/StreamReader can't be built with g++ 4.8.5 on CentOS 7
  • PARQUET-1689 - [C++] Stream API: Allow for columns/rows to be skipped when reading
  • PARQUET-1701 - [C++] Stream API: Add support for optional fields
  • PARQUET-1704 - [C++] Add re-usable encryption buffer to SerializedPageWriter
  • PARQUET-1705 - [C++] Disable shrink-to-fit on the re-usable decryption buffer
  • PARQUET-1712 - [C++] Stop using deprecated APIs in examples
  • PARQUET-1721 - [C++][Parquet] Add missing arrow dependency to parquet.pc
  • PARQUET-1734 - [C++] Fix typo
  • PARQUET-1769 - [C++] Update parquet.thrift to parquet-format 2.8.0
SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc