Apache Arrow 6.0.0 (2021-10-26)
Bug Fixes
- ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
- ARROW-8452 - [Go] support proper nested nullable flags
- ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
- ARROW-8999 - [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
- ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
- ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
- ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
- ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
- ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
- ARROW-11579 - [R] read_feather hanging on Windows
- ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
- ARROW-11729 - [R] Add examples to datasets documentation
- ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
- ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
- ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
- ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
- ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
- ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
- ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
- ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
- ARROW-12837 - [C++] Do not crash when printing invalid arrays
- ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
- ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
- ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
- ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
- ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
- ARROW-13336 - [Doc] Make clean in docs should clean generated docs
- ARROW-13422 - [R] Clarify README about S3 support on Windows
- ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
- ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
- ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
- ARROW-13430 - [Go] fix handling of zero value for FromBigInt
- ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
- ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
- ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
- ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
- ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
- ARROW-13446 - [Release] Fix verification on amazon linux
- ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
- ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
- ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
- ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
- ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
- ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
- ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
- ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
- ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
- ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
- ARROW-13496 - [CI][R] Repair r-sanitizer job
- ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
- ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
- ARROW-13500 - [C++] Fix using '-Wno-unknown-warning-option' with GCC
- ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
- ARROW-13507 - [R] LTO job on CRAN fails
- ARROW-13509 - [C++] Take kernel with empty inputs
- ARROW-13522 - [C++] Fix regression in UTF8 trim functions
- ARROW-13523 - [C++] Normalize test executable name
- ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
- ARROW-13529 - [Go] Fixing too many releases in IPC writer
- ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
- ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
- ARROW-13556 - [C++] Add protobuf to linking for flight
- ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
- ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
- ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
- ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
- ARROW-13600 - [C++] Fix maybe uninitialized warnings
- ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
- ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
- ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
- ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
- ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
- ARROW-13624 - [R] readr short type mapping has T and t backwards
- ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
- ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
- ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
- ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
- ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
- ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
- ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
- ARROW-13662 - [CI] Fix failing strftime test with older pandas
- ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
- ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
- ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
- ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
- ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
- ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
- ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
- ARROW-13694 - [R] Arrow filter crashes (R aborted session)
- ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
- ARROW-13744 - [CI] c++14 and 17 nightly job fails
- ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
- ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
- ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
- ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
- ARROW-13786 - [R][CI] Don't fail the RCHK build if arrow doesn't build
- ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
- ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
- ARROW-13799 - [R] case_when error handling is capturing strings
- ARROW-13800 - [R] Use divide instead of divide_checked
- ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
- ARROW-13814 - [CI] Fix Spark master integration tests
- ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
- ARROW-13846 - [C++] Fix crashes on invalid IPC file
- ARROW-13850 - [C++] Fix crashes on invalid Parquet data
- ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
- ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
- ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
- ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
- ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
- ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
- ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
- ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
- ARROW-13882 - [C++] Improve min_max/hash_min_max type support
- ARROW-13884 - [JS] Move source files into a separate directory
- ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
- ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
- ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
- ARROW-13916 - [C++] Implement strftime on date32/64 types
- ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
- ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
- ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
- ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
- ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
- ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
- ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
- ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
- ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
- ARROW-13997 - [C++] restore exec node based query performance
- ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
- ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
- ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
- ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
- ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
- ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
- ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
- ARROW-14027 - [C++] Handle scalars in Grouper
- ARROW-14040 - [C++] Fix result order dependence in scanner test
- ARROW-14053 - [C++][CSV] Use atomic counter for async tests
- ARROW-14057 - [C++] Bump aws-c-common version
- ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
- ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
- ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
- ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
- ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
- ARROW-14124 - [R] Timezone support in R <= 3.4
- ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
- ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
- ARROW-14141 - [IR][C++] Join missing from RelationImpl
- ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
- ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
- ARROW-14173 - [IR] Allow typed null literals to be represented
- ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
- ARROW-14184 - [C++] allow joins where the keys include new columns on the left
- ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
- ARROW-14195 - [R] Fix ExecPlan binding annotations
- ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
- ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
- ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
- ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
- ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
- ARROW-14206 - [Go][CI] Fix build on s390x and ARM
- ARROW-14208 - [C++] Fix compilation on Windows
- ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
- ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
- ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
- ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
- ARROW-14219 - [R][CI] DuckDB valgrind failure
- ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
- ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
- ARROW-14223 - [C++] add missing third-party dependency
- ARROW-14224 - [C++] Try to reduce build time/memory usage
- ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
- ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
- ARROW-14240 - [C++] Fix wrong nlohmann-json header path
- ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
- ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
- ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
- ARROW-14252 - [R] Partial matching of arguments warning
- ARROW-14255 - [Python] Fix FlightClient.do_action
- ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
- ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
- ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
- ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
- ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
- ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
- ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
- ARROW-14302 - [C++] Valgrind errors
- ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
- ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
- ARROW-14313 - [Doc] Make Archery installation docs more accurate
- ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
- ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
- ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
- ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
- ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
- ARROW-14381 - [CI][Python] Fix Spark integration failures
- ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
- ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
- ARROW-14393 - [C++] GTest linking errors during the source release verification
- ARROW-14397 - [C++] Fix valgrind error in test utility
- ARROW-14406 - [CI] Skip failing test on dask-master nightly build
- ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
- ARROW-14417 - [R] Joins ignore projection on left dataset
- ARROW-14423 - [Python] Fix version constraints in pyproject.toml
- ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
- ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
- ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
- PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
- PARQUET-2089 - [C++] Align RowGroup file_offset with specification
New Features and Improvements
- ARROW-1565 - [C++] Implement TopK/BottomK
- ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
- ARROW-4333 - [C++] Sketch out design for kernels and "query" execution in compute layer
- ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
- ARROW-5002 - [C++] Implement Hash Aggregation query execution node
- ARROW-5244 - [C++] Remove experimental marker from some APIs
- ARROW-6072 - [C++] Implement casting List <-> LargeList
- ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
- ARROW-6626 - [Python] Support converting nested sets when converting to arrow
- ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
- ARROW-7102 - [Python] Make filesystems compatible with fsspec
- ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
- ARROW-7901 - [Go][Integration] enable integration tests for null case
- ARROW-8022 - [C++] Add static and small vector implementations
- ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
- ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
- ARROW-8621 - [Release] Add post release step to add tags for Go versioning
- ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
- ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
- ARROW-9226 - [Python] Support core-site.xml default filesystem.
- ARROW-9434 - [C++] Store type code in UnionScalar
- ARROW-9719 - [Python] Improve HadoopFileSystem docstring
- ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
- ARROW-10415 - [R] Support for dplyr::distinct()
- ARROW-10898 - [C++] Improve table sort performance
- ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
- ARROW-11243 - [C++] Recognize time types in CSV files
- ARROW-11460 - [R] Use system libraries if present on Linux
- ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
- ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
- ARROW-11828 - [C++] Expose CSVWriter object in api
- ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
- ARROW-11981 - [C++] Implement Union ExecNode
- ARROW-12063 - [C++] Add null placement option to sort functions
- ARROW-12181 - [C++][R] The "CSV dataset" in test-dataset.R is failing on RTools 3.5
- ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
- ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
- ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
- ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
- ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
- ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
- ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
- ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
- ARROW-12657 - [C++] Adding String hex to numeric conversion
- ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
- ARROW-12673 - [C++] Add callback to handle incorrect column counts
- ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
- ARROW-12714 - [C++] String title case kernel
- ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
- ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
- ARROW-12744 - [C++][Compute] Add rounding kernel
- ARROW-12759 - [C++][Compute] Add ExecNode for group by
- ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
- ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
- ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
- ARROW-12871 - [R] upgrade to testthat 3e
- ARROW-12876 - [R] Fix build flags on Raspberry Pi
- ARROW-12944 - [C++] String capitalize kernel
- ARROW-12946 - [C++] String swap case kernel
- ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
- ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
- ARROW-12965 - [Java] C Data Interface implementation
- ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
- ARROW-12981 - [R] Install source package from CRAN alone
- ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
- ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
- ARROW-13067 - [C++][Compute] Implement integer to decimal cast
- ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
- ARROW-13112 - [R] altrep vectors for strings and other types
- ARROW-13132 - [C++] Add Scalar validation
- ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
- ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
- ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
- ARROW-13164 - [R] altrep vectors from Array with nulls
- ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
- ARROW-13174 - [C++][Compute] Add strftime kernel
- ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
- ARROW-13218 - [Format] Clarify interpretation of timestamp values
- ARROW-13220 - [C++] Implement 'choose' function
- ARROW-13222 - [C++] Improve type support for case_when
- ARROW-13227 - [Documentation][Compute] Document ExecNode
- ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
- ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
- ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
- ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
- ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
- ARROW-13298 - [C++] Implement any/all hash aggregate kernels
- ARROW-13307 - [C++] Remove reflection-based enums
- ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
- ARROW-13317 - [Python] Improve documentation on what 'use_threads' does in 'read_feather'
- ARROW-13326 - [R][Archery] Add linting to dev CI
- ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
- ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
- ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
- ARROW-13345 - [C++] Add basic implementation for log to base b
- ARROW-13358 - [C++] Improve type support in if_else
- ARROW-13379 - [Dev][Docs] Improvements to archery docs
- ARROW-13390 - [C++] Implement coalesce for remaining types
- ARROW-13397 - [R] Update arrow.Rmd vignette
- ARROW-13399 - [R] Update dataset.Rmd vignette
- ARROW-13402 - [R] Update flight.Rmd vignette
- ARROW-13403 - [R] Update developing.Rmd vignette
- ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
- ARROW-13405 - [Doc] Guide users to the documentation for their own platform
- ARROW-13416 - [C++] Implement mod compute function
- ARROW-13420 - [JS] Update dependencies
- ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
- ARROW-13433 - [R] Remove CLI hack from Valgrind test
- ARROW-13434 - [R] group_by() with an unnammed expression
- ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
- ARROW-13444 - [C++] Remove usage of deprecated std::result_of
- ARROW-13448 - [R] Bindings for strftime
- ARROW-13453 - [R] DuckDB has not yet released 0.2.8
- ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
- ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
- ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
- ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
- ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
- ARROW-13465 - [R] to_arrow() from duckdb
- ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
- ARROW-13468 - [Release] Fix binary download/upload failures
- ARROW-13472 - [R] Remove .engine = "duckdb" argument
- ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
- ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
- ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
- ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
- ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
- ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
- ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
- ARROW-13489 - [R] Bump CI jobs after 5.0.0
- ARROW-13501 - [R] Bindings for count aggregation
- ARROW-13502 - [R] Bindings for min/max aggregation
- ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
- ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
- ARROW-13508 - [C++] Support custom retry strategies in S3Options
- ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
- ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
- ARROW-13516 - [C++] Detect --version-script flag availability
- ARROW-13519 - [R] Make doc examples less noisy
- ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
- ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
- ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
- ARROW-13528 - [R] Bindings for mean, var, sd aggregation
- ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
- ARROW-13534 - [C++] Improve csv chunker
- ARROW-13540 - [C++] Add order by sink node
- ARROW-13541 - [C++][Python] Implement ExtensionScalar
- ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
- ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
- ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
- ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
- ARROW-13548 - [C++] Implement temporal difference kernels
- ARROW-13549 - [C++] Add casts from timestamp to date/time
- ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
- ARROW-13552 - [C++] Remove deprecated APIs
- ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
- ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
- ARROW-13562 - [R] Styler followups
- ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
- ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
- ARROW-13573 - [C++] Support dictionaries natively in case_when
- ARROW-13574 - [C++] Add 'count all' option to count kernels
- ARROW-13575 - [C++] Add hash_product kernel
- ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
- ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
- ARROW-13585 - [GLib] Add support for C ABI interface
- ARROW-13587 - [R] Handle --use-LTO override
- ARROW-13595 - [C++] Add debug mode check for compute kernel output type
- ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
- ARROW-13606 - [R] Actually disable LTO
- ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
- ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
- ARROW-13618 - [R] Use Arrow engine for summarize() by default
- ARROW-13620 - [R] Binding for n_distinct()
- ARROW-13626 - [R] Bindings for log base b
- ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
- ARROW-13629 - [Ruby] Add support for building/converting map
- ARROW-13633 - [Packaging][Debian] Add support for bookworm
- ARROW-13634 - [R] Update distro() in nixlibs.R to map from "bookworm" to 12
- ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
- ARROW-13637 - [Python] Fix docstrings
- ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
- ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
- ARROW-13646 - [Go][Parquet] adding the parquet metadata package
- ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
- ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
- ARROW-13651 - [Ruby][Symbol] to Arrow array
- ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
- ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
- ARROW-13670 - [C++] add virtual destructors
- ARROW-13674 - [CI] PR checks should check for JIRA components
- ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
- ARROW-13679 - [GLib][Ruby] Add support for group aggregation
- ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
- ARROW-13682 - [C++] Add TDigest API to merge one TDigest
- ARROW-13684 - [C++][Compute] Strftime kernel follow-up
- ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
- ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
- ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
- ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
- ARROW-13696 - [Python] Support for MapType with Fields
- ARROW-13699 - [Python][Docs] Improve filesystem documentation
- ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
- ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
- ARROW-13704 - [C#] Add support for reading streaming format delta dictionaries
- ARROW-13705 - [Website] Pin node version
- ARROW-13721 - [Doc][Cookbook] Specifying Schemas - Python
- ARROW-13733 - [Java] : Allow JDBC adapters to reuse vector schema roots
- ARROW-13734 - [Format] Clarify allowed values for time types
- ARROW-13736 - [C++] Reconcile PrettyPrint and StringFormatter
- ARROW-13737 - [C++] Support for grouped aggregation over scalar columns
- ARROW-13739 - [R] Support dplyr::count() and tally()
- ARROW-13740 - [R] summarize() should not eagerly evaluate
- ARROW-13757 - [R] Fix download of C++ source for CRAN patch releases
- ARROW-13759 - [C++] Update linting and formatting scripts to specify python3 in shebang line
- ARROW-13760 - [C++] Bump required Protobuf when using Flight
- ARROW-13764 - [C++] Support CountOptions in grouped count distinct
- ARROW-13768 - [R] Allow JSON to be an optional component
- ARROW-13772 - [R] Binding for median aggregation
- ARROW-13776 - [C++] Offline thirdparty versions.txt is missing extensions for some files
- ARROW-13777 - [R] mutate after group_by should be ok as long as there are only scalar functions
- ARROW-13778 - [R] Handle complex summarize expressions
- ARROW-13782 - [C++] Add skip_nulls/min_count to tdigest/mode/quantile
- ARROW-13783 - . [Python] Preview data when printing tables
- ARROW-13785 - [C++] Add methods to print exec nodes/plans
- ARROW-13787 - [C++] Verify third-party downloads
- ARROW-13789 - [Go] Implement Scalar Values for Go
- ARROW-13793 - [C++] Migrate ORCFileReader to Result<T>
- ARROW-13794 - [C++] Deprecate PARQUET_VERSION_2_0
- ARROW-13797 - [C++][Python] Column projection pushdown for ORC dataset reading + use liborc for column selection
- ARROW-13803 - [C++] Don't read past end of buffer in BitUtil::SetBitmap
- ARROW-13804 - [Go] Add Interval type Month, Day, Nano
- ARROW-13806 - [C++][Python] Add support for new MonthDayNano Interval Type
- ARROW-13809 - [C++][ABI] Add support for MonthDayNanoInterval to C ABI
- ARROW-13810 - [C++][Compute] Predicate IsAsciiCharacter allows invalid types and values
- ARROW-13815 - [R] : Adapt to new callstack changes in rlang
- ARROW-13816 - [Go][C] Implement Consumer APIs for C Data Interface in Go
- ARROW-13820 - [R] Rename na.min_count to min_count and na.rm to skip_nulls
- ARROW-13821 - [R] Handle na.rm in sd, var bindings
- ARROW-13823 - [Java] : Exclude .factorypath
- ARROW-13824 - [C++][Compute] Make constexpr BooleanToNumber kernel
- ARROW-13831 - [GLib][Ruby] Add support for writing by Arrow Dataset
- ARROW-13835 - [Doc][Python] Add documentation for unify_schemas
- ARROW-13842 - [C++] Bump vendored date library
- ARROW-13843 - [C++][CI] Exercise ToString / PrettyPrint in fuzzing setup
- ARROW-13845 - [C++] Reconcile RandomArrayGenerator::ArrayOf implementations
- ARROW-13847 - [Java] Avoid unnecessary collection copies
- ARROW-13849 - [C++] Wrap min_max with min/max functions
- ARROW-13852 - [R] Handle Dataset schema metadata in ExecPlan
- ARROW-13853 - [R] String to_title, to_lower, to_upper kernels
- ARROW-13855 - [C++][Python] Implement C data interface support for extension types
- ARROW-13857 - [R][CI] Remove checkbashisms download
- ARROW-13859 - [Java] Add code coverage support
- ARROW-13866 - [R] Implement Options for all compute kernels available via list_compute_functions
- ARROW-13869 - [R] Implement options for non-bound MatchSubstringOptions kernels
- ARROW-13871 - [C++] JSON reader can fail if a list array key is present in one chunk but not in a later chunk
- ARROW-13874 - [R] Implement TrimOptions
- ARROW-13883 - [Python] Allow more than numpy.array as masks when creating arrays
- ARROW-13890 - [R] Split up test-dataset.R and test-dplyr.R
- ARROW-13893 - [R] Make head/tail lazy on datasets and queries
- ARROW-13897 - [Python] Correct TimestampScalar.as_py() and DurationScalar.as_py() docstrings
- ARROW-13898 - [C++][Compute] Add support for string binary transforms
- ARROW-13899 - [Ruby] Implement slicer by compute kernels
- ARROW-13901 - [R] Implement IndexOptions
- ARROW-13904 - [R] Implement ModeOptions
- ARROW-13905 - [R] Implement ReplaceSliceOptions
- ARROW-13906 - [R] Implement PartitionNthOptions
- ARROW-13908 - [R] Implement ExtractRegexOptions
- ARROW-13909 - [GLib] Add tests for GArrowVarianceOptions
- ARROW-13909 - [GLib] Add GArrowVarianceOptions
- ARROW-13910 - [Ruby] accepts Range and selectors
- ARROW-13919 - [GLib] Add GArrowFunctionDoc
- ARROW-13924 - [R] Bindings for stringr::str_starts, stringr::str_ends, base::startsWith and base::endsWith
- ARROW-13925 - [R] Remove system installation devdocs jobs
- ARROW-13927 - [R] Add Karl to the contributors list for the pacakge
- ARROW-13928 - [R] Rename the version(s) tasks so that it's clearer which is which
- ARROW-13937 - [C++][Compute] Add explicit output values to sign function and fix unary type checks
- ARROW-13942 - [Dev] Update cmake_format usage in autotune comment bot
- ARROW-13944 - [C++] Bump xsimd to latest version
- ARROW-13958 - [Python] Migrate Python ORC bindings to use new Result-based APIs
- ARROW-13959 - [R] Update tests for extracting components from date32 objects
- ARROW-13962 - [R] Catch up on the NEWS
- ARROW-13963 - [Go] Minor: Add bitmap reader/writer impl from go Parquet module to Arrow Bitutil package
- ARROW-13964 - MINOR: [Go][Parquet] remove base bitmap reader/writer from parquet module, use arrow bitutil ones
- ARROW-13965 - [C++] dynamic_casts in parquet TypedColumnWriterImpl impacting performance
- ARROW-13966 - [C++] Support decimals in comparisons
- ARROW-13967 - [Go] Implement Concatenate function for array.Interface
- ARROW-13973 - [C++] Add a SelectKSinkNode
- ARROW-13974 - [C++] Resolve follow-up reviews for TopK/BottomK
- ARROW-13975 - [C++] Implement decimal round
- ARROW-13977 - [Format] clarify leap seconds for interval type
- ARROW-13979 - [Go] Enable -race for go tests
- ARROW-13990 - [R] Bindings for round kernels
- ARROW-13994 - [Doc][C++] Build document misses git submodule update
- ARROW-13995 - [R] Bindings for join node
- ARROW-13999 - [C++] Fix bundled LZ4 build on MinGW
- ARROW-14002 - [Python] Support tuples in unify_schemas
- ARROW-14003 - [C++][Python] Not providing a sort_key in the "select_k_unstable" kernel crashes
- ARROW-14005 - [R] Fix tests for PartitionNthOptions so that can run on various platformsFix partition_nth_indices test
- ARROW-14006 - [C++][Python] Support cast of naive timestamps to strings
- ARROW-14007 - [C++] Fix compiler warnings in decimal promotion helper
- ARROW-14008 - [R][Compute] Running an ExecPlan should yield Reader instead of Table
- ARROW-14009 - [C++] Seed parallellism in SourceNode
- ARROW-14012 - [Python] Update kernel categories in compute doc to match C++
- ARROW-14013 - [C++][Docs] Add instructions for Fedora
- ARROW-14016 - [C++] Wrong type_name used for directory partitioning
- ARROW-14019 - [R] expect_dplyr_equal() test helper function ignores grouping
- ARROW-14023 - [Ruby] Arrow::Table#slice accepts Hash
- ARROW-14025 - [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
- ARROW-14030 - [GLib] Use arrow::Result based ORC API
- ARROW-14031 - [Ruby] Use min and max separately
- ARROW-14033 - [Ruby] Append OpenSSL's .pc path automatically on macOS with Homebrew
- ARROW-14033 - [Ruby][Doc] Add macOS development guide for Red Arrow
- ARROW-14035 - [C++][Python][R] Implement count distinct kernel
- ARROW-14036 - [R] Binding for n_distinct() with no grouping
- ARROW-14043 - [Python] Allow unsigned integer index type in dictionary() type factory function
- ARROW-14044 - [R] Handle group_by .drop parameter in summarize
- ARROW-14049 - [C++][Java] Upgrade ORC to 1.7.0
- ARROW-14050 - [C++] Make TDigest/Quantile kernels return nulls instead
- ARROW-14052 - [C++] Add approximate_median aggregation
- ARROW-14054 - [C++][Docs] Simplify C++ row conversion example
- ARROW-14055 - [Docs] Add canonical url to the sphinx docs
- ARROW-14056 - [Doc][C++] Document ArrayData
- ARROW-14061 - [Go][C++] Add Cgo Arrow Memory Pool Allocator
- ARROW-14062 - [Format] Initial arrow-internal specification of compute IR
- ARROW-14064 - [CI] Use Debian 11
- ARROW-14069 - [R] By default, filter out hash functions in list_compute_functions()
- ARROW-14070 - [C++][CI] Remove support for VS2015
- ARROW-14072 - [GLib][Parquet] Add gparquet_arrow_file_reader_get_n_rows()
- ARROW-14073 - [C++] Deduplicate sort keys
- ARROW-14084 - [GLib][Ruby][Dataset] Add support for scanning from directory
- ARROW-14088 - [GLib][Ruby][Dataset] Add support for filter
- ARROW-14106 - [Go][C] Implement Exporting to the C Data Interface
- ARROW-14107 - [R][CI] Parallelize Windows CI jobs
- ARROW-14111 - [C++] Add extraction function support for time32/time64
- ARROW-14116 - [C++][Docs] Consistent variable names in WriteCSV example
- ARROW-14127 - [C++][Docs] Example of using compute function and output
- ARROW-14128 - [Go] Implement MakeArrayFromScalar for nested types
- ARROW-14132 - [C++] Improve CSV chunker tests
- ARROW-14135 - [Python] Missing Python tests for compute kernels
- ARROW-14140 - [R] skip arrow_binary/arrow_large_binary class from R metadata
- ARROW-14143 - [IR][C++] Add explicit cast node to IR
- ARROW-14146 - [Dev] Update merge script to specify python3 in shebang line
- ARROW-14150 - [C++] Don't check delimiter in CSV chunker if no quoting
- ARROW-14155 - [Go] add fingerprint and hash functions for types and scalars
- ARROW-14157 - [C++] Refactor Abseil to its own macro
- ARROW-14165 - [C++] Improve table sort performance
- ARROW-14178 - [C++] Boost download location has moved
- ARROW-14180 - [Packaging] Add support for AlmaLinux 8
- ARROW-14191 - [C++][Dataset] Dataset writes should respect backpressure
- ARROW-14194 - [Docs] Improve vertical spacing in the sphinx C++ API docs
- ARROW-14198 - [Java] Upgrade netty, grpc, and boringssl dependencies
- ARROW-14207 - [C++] Add missing dependencies for bundled Boost targets
- ARROW-14212 - [GLib][Ruby] Add GArrowTableConcatenateOptions
- ARROW-14217 - [Python][CI] Add support for python 3.10
- ARROW-14222 - [C++] implement GCSFileSystem skeleton
- ARROW-14228 - [R] Allow for creation of nullable fields
- ARROW-14230 - [C++] Deprecate ArrayBuilder::Advance
- ARROW-14232 - [C++] update crc32c to version 1.1.2
- ARROW-14235 - [C++][Compute] Use a node counter as the label if no label is supplied
- ARROW-14236 - [C++] Add GCS testbench for testing
- ARROW-14239 - [R] Don't use rlang::as_label
- ARROW-14241 - [C++][Java][CI] Fix java-jars build
- ARROW-14243 - [C++] Split vector_sort.cc
- ARROW-14244 - [C++] Reduce scalar_temporal.cc compilation time
- ARROW-14258 - [R] Warn if an SF column is made into a table
- ARROW-14259 - [R] converting from R vector to Array when the R vector is altrep
- ARROW-14261 - [C++] Includes should be in alphabetical order
- ARROW-14269 - [C++] Consolidate utf8 benchmark
- ARROW-14274 - [C++] Refine base64 api
- ARROW-14284 - [C++][Python] Improve error message when trying use SyncScanner when requiring async
- ARROW-14291 - [CI][C++] Add cpp/examples/ files to lint targets
- ARROW-14295 - [Doc] Indicate location of archery
- ARROW-14296 - [Go] Update generated flatbuf
- ARROW-14304 - [R] Update news for 6.0.0
- ARROW-14309 - [Python] Extend CompressedInputStream to work with paths, strings and files
- ARROW-14317 - [Doc] Update C data interface implementation status
- ARROW-14326 - [Docs] Add C/GLib and Ruby to C Data/Stream interface supported libraries
- ARROW-14327 - [Release] Remove conda-* from packaging group
- ARROW-14335 - [GLib][Ruby] Add support for expression
- ARROW-14337 - [C++] Arrow doesn't build on M1 when SIMD acceleration is enabled
- ARROW-14341 - [C++] Improve decimal benchmark
- ARROW-14343 - [Packaging][Python] Enable NEON SIMD optimization for M1 wheels
- ARROW-14345 - [C++] Implement streaming reads
- ARROW-14348 - [R] add group_vars.RecordBatchReader method
- ARROW-14349 - [IR] Remove RelBase
- ARROW-14358 - [Doc] Update CMake options in documentation
- ARROW-14361 - [C++] Add default simd level
- ARROW-14364 - [CI][C++] Support LLVM 13
- ARROW-14368 - [CI] Use ubuntu-latest for Azure Pipelines
- ARROW-14369 - [C++][Python] Use std::move() explicitly for g++ 4.8.5
- ARROW-14386 - [Packaging][Java] Ensure using installed devtoolset version
- ARROW-14387 - [Release][Ruby] Check Homebrew/MSYS2 package version before releasing
- ARROW-14396 - [R][Doc] Remove relic note in write_dataset that columns cannot be renamed
- ARROW-14400 - [Go] Equals and ApproxEquals for Tables and Chunked Arrays
- ARROW-14401 - [C++] Fix bundled crc32c's include path
- ARROW-14402 - [Release][Yum] Specify gpg path explicitly
- ARROW-14404 - [Release][APT] Skip arm64 Debian GNU/Linux bookwarm verification
- ARROW-14408 - [Packaging][Crossbow] Option for skipping artifact pattern validation
- ARROW-14410 - [Python][Packaging] Use numpy 1.21.3 to build python 3.10 wheels for macOS and windows
- ARROW-14452 - [Release][JS] Update Javascript testing
- ARROW-14511 - [Website][Rust] Rust 6.0.0 release blog post
- PARQUET-490 - [C++][Parquet] Basic support for reading DELTA_BINARY_PACKED data