Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

@apache-arrow/esnext-umd

Package Overview
Dependencies
Maintainers
7
Versions
45
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@apache-arrow/esnext-umd - npm Package Versions

1235

0.15.0

Diff

Changelog

Source

Apache Arrow 0.15.0 (2019-10-05)

New Features and Improvements

  • ARROW-453 - [C++] Filesystem implementation for Amazon S3
  • ARROW-517 - [C++] array comparison, uses D**2 space Myers
  • ARROW-750 - [Format][C++] Add LargeBinary and LargeString types
  • ARROW-1324 - [C++] Add support for bundled Boost with MSVC
  • ARROW-1561 - [C++] Kernel implementations for IsIn
  • ARROW-1566 - [C++] Implement non-materializing sort kernels
  • ARROW-1741 - [C++] Add DictionaryArray::CanCompareIndices
  • ARROW-1786 - [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification
  • ARROW-1789 - [Format] Consolidate specification documents and improve clarity for new implementation authors
  • ARROW-1875 - [Java] Write 64-bit ints as strings in integration test JSON files
  • ARROW-2006 - [C++] Add option to trim excess padding when writing IPC messages
  • ARROW-2431 - [Rust] Schema fidelity
  • ARROW-2769 - [Python] Deprecate and rename add_metadata methods
  • ARROW-2931 - [Crossbow] Windows builds are attempting to run linux and osx packaging tasks
  • ARROW-3032 - [C++] Clean up Numpy-related headers
  • ARROW-3204 - [R] Enable R package to be made available on CRAN
  • ARROW-3243 - [C++] Upgrade jemalloc to version 5
  • ARROW-3246 - [C++][Python][Parquet] Direct writing of DictionaryArray to Parquet columns, automatic decoding to Arrow
  • ARROW-3325 - [Python][FOLLOWUP] In Python 2.7, a class's doc member is not writable (#5018)
  • ARROW-3325 - [Python][Parquet] Add "read_dictionary" argument to parquet.read_table, ParquetDataset to enable direct-to-DictionaryArray reads
  • ARROW-3531 - [Python] add Schema.field() method / deprecate field_by_name
  • ARROW-3538 - [Python] ability to override the automated assignment of uuid for filenames when writing datasets
  • ARROW-3579 - [Crossbow] Unintuitive error message when remote branch has not been pushed
  • ARROW-3643 - [Rust] optimize BooleanBufferBuilder::append_slice
  • ARROW-3710 - [Crossbow][Python] Run nightly tests against pandas master
  • ARROW-3772 - [C++][Parquet] Write Parquet dictionary indices directly to DictionaryBuilder rather than routing through dense form
  • ARROW-3777 - [C++] Add Slow input streams and slow filesystem
  • ARROW-3817 - [R] Extract methods for RecordBatch and Table
  • ARROW-3829 - [Python] add arrow_array protocol to support third-party array classes in conversion to Arrow
  • ARROW-3943 - [R] Write vignette for R package
  • ARROW-4036 - [C++] Pluggable Status message, by exposing an abstract delegate class.
  • ARROW-4095 - [C++] Optimize DictionaryArray::Transpose() for trivial transpositions
  • ARROW-4111 - [Python] Create time types from Python sequences of integers
  • ARROW-4218 - [Rust][Parquet] Initial support for array reader.
  • ARROW-4220 - [Python] Add buffered IO benchmarks with simulated high latency, allow duck-typed files in input_stream/output_stream
  • ARROW-4365 - [Rust][Parquet] Implement arrow record reader.
  • ARROW-4398 - [C++][Python][Parquet] Improve BYTE_ARRAY PLAIN encoding write performance. Add BYTE_ARRAY write benchmarks
  • ARROW-4473 - [Website] Add instructions to do a test-deploy of Arrow website and fix bugs
  • ARROW-4507 - [Format] Create outline and introduction for new document.
  • ARROW-4508 - [Format] Copy content from Layout.rst to new document.
  • ARROW-4509 - [Format] Copy content from Metadata.rst to new document.
  • ARROW-4510 - [Format] copy content from IPC.rst to new document.
  • ARROW-4511 - [Format][Docs] Revamp Format documentation, consolidate columnar format docs into a more coherent single document. Add Versioning/Stability page
  • ARROW-4648 - [Doc] Add documentation about C++ file naming
  • ARROW-4648 - [C++] Use underscores in source file names
  • ARROW-4649 - [C++/CI/R] Add nightly job that tests the homebrew formula
  • ARROW-4752 - [Rust] Add explicit SIMD vectorization for the divide kernel
  • ARROW-4810 - [Format][C++] Add LargeList type
  • ARROW-4841 - [C++] Add arrowOptions.cmake with options used to build arrow
  • ARROW-4860 - [C++] Build AWS C++ SDK for Windows in conda-forge
  • ARROW-5134 - [R][CI] Run nightly tests against multiple R versions
  • ARROW-5211 - [Format] Missing documentation under `Dictionary encoding` section on MetaData page
  • ARROW-5216 - [CI] Add Appveyor badge to README
  • ARROW-5307 - [CI][GLib] Enable GTK-Doc
  • ARROW-5337 - [C++] Add RecordBatch::field method, possibly deprecate "column"
  • ARROW-5343 - [C++] Refactor dictionary unification to incremental interface, and use Buffer for transpose map allocations
  • ARROW-5344 - [C++] Use ArrayDataVisitor in dict-to-anything cast
  • ARROW-5351 - [Rust] Take kernel
  • ARROW-5358 - [Rust] Implement equality check for ArrayData and Array
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5439 - [Java] Utilize stream EOS in File format
  • ARROW-5444 - [Release][Website] After 0.14 release, update what is an "official" release
  • ARROW-5458 - [C++] Apache Arrow parallel CRC32c computation optimization
  • ARROW-5480 - [Python] Add unit test asserting specifically that pandas.Categorical roundtrips to Parquet format without special options
  • ARROW-5483 - [Java] add ValueVector constructors that take Field object
  • ARROW-5494 - [Python] Create FileSystem bindings
  • ARROW-5505 - [R] Normalize file and class names, stop masking base R functions, add vignette, improve documentation
  • ARROW-5527 - [C++] Uses Buffer/Builder in HashTable and MemoTable
  • ARROW-5558 - [C++] Support Array::View on arrays with non-zero offset
  • ARROW-5559 - [C++] Add an IpcOptions structure
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5579 - [Java] Shade flatbuffers
  • ARROW-5580 - [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
  • ARROW-5588 - [C++] Better support for building union arrays
  • ARROW-5594 - [C++] add UnionArrays support to Take/Filter kernels
  • ARROW-5610 - [Python] define extension types in Python
  • ARROW-5646 - [Crossbow][Documentation] Move the user guide to the Sphinx documentation
  • ARROW-5681 - [FlightRPC] Add Flight-specific error APIs
  • ARROW-5686 - [R] Review R Windows CI build
  • ARROW-5716 - [Developer] Improve merge PR script to attribute multiple authors
  • ARROW-5717 - [Python] Unify variable dictionaries when converting to pandas
  • ARROW-5719 - [Java] Support in-place vector sorting
  • ARROW-5722 - [Rust] Implement Debug for List/Struct/BinaryArray
  • ARROW-5734 - [Python] Dispatch to Table.from_arrays from pyarrow.table factory function
  • ARROW-5736 - [Format][C++] Support small bit-width indices in sparse tensor
  • ARROW-5741 - [JS] Make numeric vector from functions consistent with TypedArray.from
  • ARROW-5743 - [C++] Add cmake option and macros for enabling large memory tests
  • ARROW-5746 - [Website] Move website source out of apache/arrow
  • ARROW-5747 - [C++] Improve CSV header and column names options
  • ARROW-5758 - [C++][Gandiva][Java] Support casting decimals to varchar and vice versa
  • ARROW-5762 - [JS] Align Map type impl with the spec
  • ARROW-5777 - [C++] Add microbenchmark for some Decimal128 operations
  • ARROW-5778 - [Java] Extract the logic for vector data copying to the super classes
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5786 - [Release] Use arrow-jni profile to run "mvm release:perform"
  • ARROW-5788 - [Rust] Use both "path" and "version" for internal dependencies
  • ARROW-5789 - [C++] Minor fixes for warnings, remove unused ubsan.cc
  • ARROW-5792 - [Rust] Add TypeVisitor for parquet type.
  • ARROW-5798 - [Packaging][deb] Update doc architecture
  • ARROW-5800 - [R] Dockerize R Travis CI tests so they can be run anywhere via docker-compose
  • ARROW-5803 - [CI] Dockerize C++ with clang 7 Travis CI
  • ARROW-5812 - [Java] Refactor method name and param type in BaseIntVector
  • ARROW-5813 - [C++] Fix TensorEquals for different contiguous tensors
  • ARROW-5814 - [Java] Implement a <Object, int> HashMap for DictionaryEncoder
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5830 - [C++] Stop using memcmp in TensorEquals for tensors with float values
  • ARROW-5832 - [Java] Support search operations for vector data
  • ARROW-5833 - [C++] Factor out Status-enriching code
  • ARROW-5834 - [Java] Apply new hash map in DictionaryEncoder
  • ARROW-5835 - [Java] Support Dictionary Encoding for binary type
  • ARROW-5841 - [Website] Add 0.14.0 release note
  • ARROW-5842 - [Java] Revise the semantic of lastSet in ListVector
  • ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount
  • ARROW-5844 - [Java] Support comparison & sort for more numeric types
  • ARROW-5846 - [Java] Create Avro adapter module and add dependencies
  • ARROW-5853 - [Python] Expose boolean filter kernel on Array
  • ARROW-5861 - [Java] Initial implement to convert Avro record with primitive types
  • ARROW-5862 - [Java] Provide dictionary builder
  • ARROW-5864 - [Python] Simplify Result class cython wrapper
  • ARROW-5865 - [Release] Helper script to rebase PRs on master
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5867 - [C++][Gandiva] add support for cast int to decimal
  • ARROW-5872 - [C++][Gandiva] Support mod(double, double) function in Gandiva
  • ARROW-5876 - [C++][Python] add basic auth flight proto message to C++ and Python
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5880 - [C++][Parquet] Use TypedBufferBuilder instead of ArrayBuilder in writer.cc
  • ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits
  • ARROW-5883 - [Java] Support dictionary encoding for List and Struct type
  • ARROW-5888 - [C++][Parquet][Python] Restore timezone metadata when original Arrow schema has been stored in Parquet metadata
  • ARROW-5891 - [C++][Gandiva] Remove duplicates in function registry
  • ARROW-5892 - [C++][Gandiva] Support function aliases
  • ARROW-5893 - [C++][Python][GLib][Ruby][MATLAB][R] Remove arrow::Column class
  • ARROW-5897 - [Java] Remove duplicated logic in MapVector
  • ARROW-5898 - [Java] Provide functionality to efficiently compute hash code for arbitrary memory segment
  • ARROW-5900 - [Java] Bounds check for decimal args.
  • ARROW-5901 - [Rust] Add equals to json arrays.
  • ARROW-5902 - [Java] Implement hash table and equals & hashCode API for dictionary encoding
  • ARROW-5903 - [Java] Optimise set methods in decimal vector
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5906 - [CI] Turn off ARROW_VERBOSE_THIRDPARTY_BUILD by default in Docker builds
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5909 - [Java] Optimize ByteFunctionHelpers equals & compare logic
  • ARROW-5911 - [Java] Make ListVector and MapVector create reader lazily
  • ARROW-5917 - [Java] Redesign the dictionary encoder
  • ARROW-5918 - [Java] Add get to BaseIntVector interface
  • ARROW-5919 - [R] Test R-in-conda as a nightly build
  • ARROW-5920 - [Java] Support sort & compare for all variable width vectors
  • ARROW-5924 - [Plasma] return a replica of GpuProcessHandle::ptr when create or get an object
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5943 - [GLib][Gandiva] Add support for function aliases
  • ARROW-5944 - [C++][Gandiva] Remove 'div' alias for 'divide'
  • ARROW-5945 - [Rust][DataFusion] Table trait can now be used to build real queries
  • ARROW-5947 - [Rust][DataFusion] Remove serde crate dependency
  • ARROW-5948 - [Rust] [DataFusion] create_logical_plan should not call optimizer
  • ARROW-5955 - [Plasma] Support setting memory quotas per plasma client for better isolation
  • ARROW-5957 - [C++][Gandiva] Implement div function in Gandiva
  • ARROW-5958 - [Python] Link zlib statically in the wheels
  • ARROW-5961 - [R] Be able to run R-only tests even without C++ library
  • ARROW-5962 - [CI][Python] Remove manylinux1 builds from Travis CI
  • ARROW-5967 - [Java] DateUtility#timeZoneList is not correct
  • ARROW-5970 - [Java] Provide pointer to Arrow buffer
  • ARROW-5974 - [C++] Support reading concatenated compressed streams
  • ARROW-5975 - [C++][Gandiva] support castTIMESTAMP(date)
  • ARROW-5976 - [C++] RETURN_IF_ERROR(ctx) should be namespaced
  • ARROW-5977 - [C++][Python] Allow specifying which columns to include
  • ARROW-5979 - [FlightRPC] Expose opaque (de)serialization of protocol types
  • ARROW-5985 - [Developer] Do not suggest setting Fix Version for patch releases by default
  • ARROW-5986 - [Java] Code cleanup for dictionary encoding
  • ARROW-5988 - [Java] Avro adapter implement simple Record type
  • ARROW-5997 - [Java] Support dictionary encoding for Union type
  • ARROW-5998 - [Java] Open a document to track the API changes
  • ARROW-6000 - [Python] Add support for LargeString and LargeBinary types
  • ARROW-6008 - [Release] Stop parallel binary artifacts upload
  • ARROW-6009 - [JS] Ignore NPM errors in the javascript release script
  • ARROW-6013 - [Java] Support range searcher
  • ARROW-6017 - [FlightRPC] Enable creating Flight Locations for unknown schemes
  • ARROW-6020 - [Java] Refactor ByteFunctionHelper#hash with new added ArrowBufHasher
  • ARROW-6021 - [Java] Extract copyFrom and copyFromSafe methods to ValueVector interface
  • ARROW-6022 - [Java] Support equals API in ValueVector to compare two vectors equal
  • ARROW-6023 - [C++][Gandiva] Add functions in Gandiva
  • ARROW-6024 - [Java] Provide more hash algorithms
  • ARROW-6026 - [Doc] Add CONTRIBUTING.md
  • ARROW-6030 - [Java] Efficiently compute hash code for ArrowBufPointer
  • ARROW-6031 - [Java] Support iterating a vector by ArrowBufPointer
  • ARROW-6034 - [C++][Gandiva] Add string functions in Gandiva
  • ARROW-6035 - [Java] Avro adapter support convert nullable value
  • ARROW-6036 - [GLib] Add support for skip rows and column_names CSV read option
  • ARROW-6037 - [GLib] Add a missing version macro
  • ARROW-6039 - [GLib] Add garrow_array_filter()
  • ARROW-6041 - [Website] Blog post announcing R library availability on CRAN
  • ARROW-6042 - [C++][Parquet] Add Dictionary32Builder that always returns 32-bit dictionary indices
  • ARROW-6045 - [C++] Add benchmark for double and float encoding/decoding, as well as NaN encoding
  • ARROW-6048 - [C++] Add ChunkedArray::View method that dispatches to Array::View
  • ARROW-6049 - [C++] Support view from one dictionary type to another in Array::View
  • ARROW-6053 - [Python] Fix pyarrow's RecordBatchStreamReader::Open2 type signature
  • ARROW-6063 - [FlightRPC] implement half-closed semantics for DoPut
  • ARROW-6065 - [C++][Parquet] Clean up parquet/arrow/reader.cc, reduce code duplication, improve readability
  • ARROW-6069 - [Rust][Parquet] Add converter.
  • ARROW-6070 - [Java] Avoid creating new schema before IPC sending
  • ARROW-6077 - [C++][Parquet] Build Arrow "schema tree" from Parquet schema to help with nested data implementation
  • ARROW-6078 - [Java] Implement dictionary-encoded subfields for List type
  • ARROW-6079 - [Java] Implement/test UnionFixedSizeListWriter for FixedSizeListVector
  • ARROW-6080 - [Java] Support search operation for BaseRepeatedValueVector
  • ARROW-6083 - [Java] Refactor Jdbc adapter consume logic
  • ARROW-6084 - [Python] Support LargeList
  • ARROW-6085 - [Rust][DataFusion] Add traits for physical query plan
  • ARROW-6086 - [Rust][DataFusion] Add support for partitioned Parquet data sources
  • ARROW-6087 - [Rust] [DataFusion] Implement parallel execution for CSV scan
  • ARROW-6088 - [Rust][DataFusion] Projection execution plan
  • ARROW-6089 - [Rust][DataFusion] Implement physical plan for "selection" operator
  • ARROW-6090 - [Rust][DataFusion] Physical plan for HashAggregate
  • ARROW-6093 - [Java] reduce branches in algo for first match in VectorRangeSearcher
  • ARROW-6094 - [FlightRPC] Add Flight RPC method getFlightSchema
  • ARROW-6096 - [C++] conditionally use boost regex for gcc < 4.9
  • ARROW-6097 - [Java] Avro adapter implement unions type
  • ARROW-6100 - [Rust] Pin to specific nightly rust for reproducible/stable builds
  • ARROW-6101 - [Rust][DataFusion] Parallel execution of physical query plan
  • ARROW-6102 - [Testing] Add partitioned CSV file to arrow-testing repo
  • ARROW-6104 - [Rust][DataFusion] Remove use of bare trait objects
  • ARROW-6105 - [C++][Parquet][Python] Add test case showing dictionary-encoded subfields in nested type
  • ARROW-6113 - [Java] Support vector deduplicate function
  • ARROW-6115 - [Python] Support LargeBinary and LargeString in conversion to python
  • ARROW-6118 - [Java] Replace google Preconditions with Arrow Preconditions
  • ARROW-6121 - [Tools] Improve merge tool ergonomics
  • ARROW-6125 - [Python] Remove Python APIs deprecated in 0.14.x and prior
  • ARROW-6127 - [Website] Add favicons and meta tags
  • ARROW-6128 - [C++] Suppress a class-memaccess warning
  • ARROW-6130 - [Release] Use 0.15.0 as the next release
  • ARROW-6134 - [C++][Gandiva] Add concat function in Gandiva
  • ARROW-6137 - [C++][Gandiva] Use snprintf instead of stringstream in castVARCHAR(timestamp)
  • ARROW-6137 - [C++][Gandiva] Change output format of castVARCHAR(timestamp) in Gandiva
  • ARROW-6138 - [C++] Add a basic (single RecordBatch) implementation of Dataset
  • ARROW-6139 - [Documentation][R] Build R docs (pkgdown) site and add to arrow-site
  • ARROW-6141 - [C++] Enable memory-mapping a file region
  • ARROW-6142 - [R] Install instructions on linux could be clearer
  • ARROW-6143 - [Java] Unify the copyFrom and copyFromSafe methods for all vectors
  • ARROW-6144 - [C++][Gandiva] Implement random functions in Gandiva
  • ARROW-6155 - [Java] Extract a super interface for vectors whose elements reside in continuous memory segments
  • ARROW-6156 - [Java] Support compare semantics for ArrowBufPointer
  • ARROW-6161 - [C++][Dataset] Implements ParquetFragment
  • ARROW-6162 - [C++][Gandiva] Do not truncate string in castVARCHAR_utf8 if output length is zero
  • ARROW-6164 - [Docs][Format] Document project versioning schema and forward/backward compatibility policies
  • ARROW-6172 - [Java] Provide benchmarks to set IntVector with different methods
  • ARROW-6177 - [C++] Add Array::Validate()
  • ARROW-6180 - [C++][Parquet] Add RandomAccessFile::GetStream that returns InputStream that reads a file segment independent of the file's state, fix concurrent buffered Parquet column reads
  • ARROW-6181 - [R] Only allow R package to install without libarrow on linux
  • ARROW-6183 - [R] Document that you don't have to use tidyselect if you don't want
  • ARROW-6185 - [Java] Provide hash table based dictionary builder
  • ARROW-6187 - [C++] Fallback to storage type when writing ExtensionType to Parquet
  • ARROW-6188 - [GLib] Add garrow_array_is_in()
  • ARROW-6192 - [GLib] Use the same SO version as C++
  • ARROW-6194 - [Java] Add non-static approach in DictionaryEncoder making it easy to extend and reuse
  • ARROW-6196 - [Ruby] Add support for building Arrow::TimeNNArray by .new
  • ARROW-6197 - [GLib] Add garrow_decimal128_rescale()
  • ARROW-6199 - [Java] Avro adapter avoid potential resource leak.
  • ARROW-6203 - [GLib] Add garrow_array_sort_to_indices()
  • ARROW-6204 - [GLib] Add garrow_array_is_in_chunked_array()
  • ARROW-6206 - [Java][Docs] Document environment variables/java properties
  • ARROW-6209 - [Java] Extract set null method to the base class for fixed width vectors
  • ARROW-6212 - [Java] Support vector rank operation
  • ARROW-6216 - [C++][Parquet] Expose codec compression level to user, add to Parquet writer properties
  • ARROW-6217 - [Website] Remove needless _site/ directory
  • ARROW-6219 - [Java] Add API for JDBC adapter that can convert less then the full result set at a time
  • ARROW-6220 - [Java] Add API to avro adapter to limit number of rows returned at a time.
  • ARROW-6225 - [Website] Update arrow-site/README and any other places to point website contributors in right direction
  • ARROW-6229 - [C++][Dataset] implement FileSystemBasedDataSource
  • ARROW-6230 - [R] Reading in Parquet files are 20x slower than reading fst files in R
  • ARROW-6231 - [C++] Allow generating CSV column names
  • ARROW-6232 - [C++] Rename Argsort kernel to SortToIndices
  • ARROW-6237 - [R] Allow compilation flags to be passed for R package with ARROW_R_CXXFLAGS
  • ARROW-6238 - [C++][Dataset] Implement SimpleDataSource, SimpleDataFragment and SimpleScanTask
  • ARROW-6240 - [Ruby] Arrow::Decimal128Array#get_value returns BigDecimal
  • ARROW-6242 - [C++][Dataset] Implement Dataset, Scanner and ScannerBuilder
  • ARROW-6243 - [C++][Dataset] Filter expressions
  • ARROW-6244 - [C++][Dataset] Add partition key to DataSource interface
  • ARROW-6246 - [Website] Add link to R documentation site
  • ARROW-6247 - [Java] Provide a common interface for float4 and float8 vectors
  • ARROW-6249 - [Java] Remove useless class ByteArrayWrapper
  • ARROW-6250 - [Java] Implement ApproxEqualsVisitor comparing approx for floating point
  • ARROW-6252 - [C++][Python] Add Array::Diff in C++ and Array.diff in Python to return diff as string
  • ARROW-6253 - [Python] Expose "enable_buffered_stream" option from parquet::ReaderProperties in pyarrow.parquet.read_table
  • ARROW-6258 - [R] Add macOS build scripts
  • ARROW-6260 - [Website] Use deploy key on Travis to build and push to asf-site
  • ARROW-6262 - [Developer] Show JIRA issue before merging
  • ARROW-6264 - [Java] There is no need to consider byte order in ArrowBufHasher
  • ARROW-6265 - [Java] Avro adapter implement Array/Map/Fixed type
  • ARROW-6267 - [Ruby] Add Arrow::Time for Arrow::Time{32,64}DataType value
  • ARROW-6271 - [Rust][DataFusion] Add example for running SQL against Parquet
  • ARROW-6272 - [Rust][DataFusion] Add register_parquet convenience method to ExecutionContext
  • ARROW-6278 - [R] Read parquet files from raw vector
  • ARROW-6279 - [Python] Add Table.slice, getitem support to match RecordBatch, Array, others
  • ARROW-6284 - [C++] Allow references in std::tuple when converting tuple to arrow array
  • ARROW-6287 - [Rust][DataFusion] TableProvider.scan() returns thread-safe BatchIterator
  • ARROW-6288 - [Java] Implement TypeEqualsVisitor comparing vector type equals considering names and metadata
  • ARROW-6289 - [Java] Add empty() in UnionVector to create instance
  • ARROW-6292 - [C++] Add option to use the mimalloc allocator
  • ARROW-6294 - [C++] Use hyphen for plasma-store-server executable
  • ARROW-6295 - [Rust][DataFusion] ExecutionError Cannot compare Float32 with Float64
  • ARROW-6296 - [Java] Cleanup JDBC interfaces and eliminate one memcopy for binary/varchar fields
  • ARROW-6297 - [Java] Compare ArrowBufPointers by unsinged integers
  • ARROW-6300 - [C++] Add Abort() method to streams
  • ARROW-6303 - [Rust] Add a feature to disable SIMD
  • ARROW-6304 - [Java][Doc] Add a description to each module
  • ARROW-6306 - [Java] Support stable sort by stable comparators
  • ARROW-6310 - [C++] Write 64-bit integers as strings in JSON integration test files
  • ARROW-6311 - [Java] Make ApproxEqualsVisitor accept DiffFunction to make it more flexible
  • ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files.
  • ARROW-6314 - [C#] Implement IPC message format alignment changes, provide backwards compatibility and "legacy" option to emit old message format
  • ARROW-6314 - [C++] Implement IPC message format alignment changes, provide backwards compatibility and "legacy" option to emit old message format
  • ARROW-6315 - [Java] Make change to ensure flatbuffer reads are aligned
  • ARROW-6316 - [Go] implement new ARROW format with 32b-aligned buffers
  • ARROW-6317 - [JS] Implement IPC message format alignment changes
  • ARROW-6318 - [Integration] Run tests against pregenerated files
  • ARROW-6319 - [C++] Move the core of NumericTensor<T>::Value() to Tensor::Value<T>()
  • ARROW-6326 - [C++] Nullable fields when converting std::tuple to Table
  • ARROW-6328 - [Developer][crossbow] Click.option-s should have help text
  • ARROW-6329 - [Format] Add a padding for Flatbuffer alignment, use 8-byte EOS
  • ARROW-6331 - [Java] Incorporate ErrorProne into the java build
  • ARROW-6334 - [Java] Improve the dictionary builder API to return the position of the value in the dictionary
  • ARROW-6335 - [Java] Improve the performance of DictionaryHashTable
  • ARROW-6336 - [Python] Add notes to pyarrow.serialize/deserialize to clarify that these functions do not read or write the standard IPC protocol
  • ARROW-6337 - [R] Changed as_tible to as_dataframe in the R package
  • ARROW-6338 - [R] Type function names don't match type names
  • ARROW-6342 - [Python] Add pyarrow.record_batch factory function with same basic API / semantics as pyarrow.table
  • ARROW-6346 - [GLib] Add garrow_array_view()
  • ARROW-6347 - [GLib] Add garrow_array_diff()
  • ARROW-6350 - [Ruby] Remove Arrow::Struct and use Hash instead
  • ARROW-6351 - [Ruby] Improve Arrow#values performance
  • ARROW-6353 - [Python][C++] Expose compression_level option to parquet.write_table
  • ARROW-6355 - [Java] Make range equal visitor reusable
  • ARROW-6356 - [Java] Avro adapter implement Enum type and nested Record
  • ARROW-6357 - [C++] Issue S3 file writes in the background by default
  • ARROW-6358 - [C++] Add FileSystem::DeleteDirContents
  • ARROW-6360 - [R] Update support for compression
  • ARROW-6362 - [C++] Allow customizing S3 credentials provider
  • ARROW-6365 - [R] Should be able to coerce numeric to integer with schema
  • ARROW-6366 - [Java] Make field vectors final explicitly
  • ARROW-6368 - [C++][Dataset] Add interface for "projecting" RecordBatch from one schema to another, inserting null values where needed
  • ARROW-6373 - [C++] Make FixedWidthBinaryBuilder consistent with other fixed width builders in zeroing memory when appending null batches
  • ARROW-6375 - [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
  • ARROW-6379 - [C++] Write no IPC buffer metadata for NullType
  • ARROW-6381 - [C++] BufferOutputStream::Write does extra work that slows down small writes
  • ARROW-6383 - [Java] Report outstanding child allocators on close
  • ARROW-6384 - [C++] Bump dependency versions
  • ARROW-6385 - [C++] Use xxh3 instead of custom hashing code for non-tiny strings
  • ARROW-6391 - [Python][Flight] Add built-in methods on FlightServerBase to start server and wait for it to be available
  • ARROW-6397 - [C++][CI] Generate minio server connect string
  • ARROW-6401 - [Java] Implement dictionary-encoded subfields for Struct type
  • ARROW-6402 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6403 - [Python] Expose FileReader::ReadRowGroups() to Python
  • ARROW-6408 - [Rust] use "if cfg!" pattern
  • ARROW-6413 - [R] Support autogenerating column names
  • ARROW-6415 - [R] Remove usage of R CMD config CXXCPP
  • ARROW-6416 - [Python] Improve API & documentation regarding chunksizes
  • ARROW-6417 - [C++][Parquet] Miscellaneous optimizations yielding slightly better Parquet binary read performance
  • ARROW-6419 - [Website] Blog post about Parquet dictionary performance work coming in 0.15.x release
  • ARROW-6422 - [Gandiva] Fix double-conversion linker issue
  • ARROW-6426 - [FlightRPC][C++][Java] Expose gRPC configuration knobs
  • ARROW-6427 - [GLib] Add support for column names autogeneration CSV read option
  • ARROW-6438 - [R] : Add bindings for filesystem API
  • ARROW-6447 - [C++] Allow rest of arrow_objlib to build in parallel while memory_pool.cc is waiting on jemalloc_ep
  • ARROW-6450 - [C++] Use 2x reallocation strategy in BufferBuilder instead of 1.5x
  • ARROW-6451 - [Format] Add clarifications to Columnar.rst about the contents of "null" slots in Varbinary or List arrays
  • ARROW-6453 - [C++] More informative error messages with S3
  • ARROW-6454 - [LICENSE] Add LLVM's license due to static linkage
  • ARROW-6458 - [Java] Remove value boxing/unboxing for ApproxEqualsVisitor
  • ARROW-6460 - [Java] Add benchmark and large fake data UT for avro adapter
  • ARROW-6462 - [C++] Fix build error on CentOS 6 x86_64 with bundled double-conversion
  • ARROW-6465 - [Python] Improvement to Windows build instructions
  • ARROW-6474 - [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable
  • ARROW-6475 - [C++] Don't try to dictionary encode dictionary arrays
  • ARROW-6477 - [Packaging][Crossbow] Use Azure Pipelines to build linux packages
  • ARROW-6480 - [Crossbow] Summary report e-mailer with polling logic
  • ARROW-6484 - [Java] Enable create indexType for DictionaryEncoding according to dictionary value count
  • ARROW-6487 - [Rust][DataFusion] Introduce common test module
  • ARROW-6489 - [Developer][Documentation] Fix merge script and readme
  • ARROW-6490 - [Java][Memory] Log error for leak in allocator close
  • ARROW-6491 - [Java][Hotfix] fix master fail caused by ErrorProne
  • ARROW-6494 - [C++][Dataset] Implement basic PartitionScheme
  • ARROW-6504 - [Python][Packaging] Add mimalloc to conda packages for better performance
  • ARROW-6505 - [Website] Add new committers
  • ARROW-6518 - [Packaging][Python] Flight failing in OSX Python wheel builds
  • ARROW-6519 - [Java] Use IPC continuation prefix as part of 8-byte EOS
  • ARROW-6524 - [Developer][Packaging] Nightly build report's subject should contain Arrow
  • ARROW-6525 - [C++] Avoid aborting in CloseFromDestructor()
  • ARROW-6526 - [C++] Poison data in debug mode
  • ARROW-6527 - [C++] Add OutputStream::Write(Buffer)
  • ARROW-6531 - [Python] Add detach() method to buffered streams
  • ARROW-6532 - [R] write_parquet() uses writer properties (general and arrow specific)
  • ARROW-6533 - [R] Compression codec should take a "level"
  • ARROW-6534 - [Java] Fix typos and spelling
  • ARROW-6539 - [R] Provide mechanism to write out old format
  • ARROW-6540 - [R] Add Validate() methods
  • ARROW-6541 - [Format][C++] Update Columnar.rst for two-part EOS, update C++ implementation
  • ARROW-6542 - [R] : Add View() method to array types
  • ARROW-6544 - [R] Documentation/polishing for 0.15 release
  • ARROW-6545 - [Go] update IPC writer to use two-part EOS
  • ARROW-6546 - [C++] Add missing FlatBuffers source dependency
  • ARROW-6549 - [C++] Switch to jemalloc 5.2.x
  • ARROW-6556 - [Python] Fix warning for pandas SparseDataFrame removal
  • ARROW-6556 - [Python] Handle future removal of pandas SparseDataFrame
  • ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas. Add mechanism to preserve "column names" from RecordBatch, Table as Series.name
  • ARROW-6558 - [C++] Refactor Iterator to type erased handle
  • ARROW-6559 - [Developer][C++] Add option to pass ARROW_PACKAGE_PREFIX when using 'archery benchmark'
  • ARROW-6563 - [Rust][DataFusion] MergeExec
  • ARROW-6569 - [Website] Add support for auto deployment by GitHub Actions
  • ARROW-6570 - [Python] Use Arrow's allocators for creating NumPy array instead of leaving it to NumPy
  • ARROW-6580 - [Java] Support comparison for unsigned integers
  • ARROW-6584 - [Python][Wheel] Bundle zlib again with the windows wheels
  • ARROW-6588 - [C++] Suppress class-memaccess warning with g++ 9.2.1
  • ARROW-6589 - [C++] Error propagation, tests for /MakeArray(OfNulls|FromScalar)/
  • ARROW-6590 - [C++] Do not require ARROW_JSON to build ARROW_IPC when unit tests are off
  • ARROW-6591 - [R] Ignore .Rhistory files in source control
  • ARROW-6599 - [Rust][DataFusion] Add aggregate traits and SUM implementation to physical query plan
  • ARROW-6601 - [Java] Improve JDBC adapter performance & add benchmark
  • ARROW-6605 - [C++][Filesystem] Add recursion depth control to fs::Selector
  • ARROW-6606 - [C++] Add PathTree tree structure
  • ARROW-6609 - [C++] Add Dockerfile for minimal C++ build
  • ARROW-6613 - [C++] Remove dependency on boost::filesystem
  • ARROW-6614 - [C++][Dataset] Implement FileSystemDataSourceDiscovery
  • ARROW-6616 - [Website] Release announcement blog post for 0.15
  • ARROW-6621 - [Rust][DataFusion] Run DataFusion examples in CI
  • ARROW-6629 - [Doc][C++] Add filesystem docs
  • ARROW-6630 - [Doc] Document C++ file formats
  • ARROW-6644 - [JS] Amend NullType IPC protocol to append no buffers
  • ARROW-6647 - [C++] Stop using member initializer for shared_ptr
  • ARROW-6648 - [Go] Expose the bitutil package
  • ARROW-6649 - [R] print methods for Array, ChunkedArray, Table, RecordBatch
  • ARROW-6653 - [Developer] Add support for auto JIRA link on pull request
  • ARROW-6655 - [Python] Filesystem bindings for S3
  • ARROW-6664 - [C++] Add CMake option to build without SSE4.2 instructions
  • ARROW-6665 - [Rust][DataFusion] Implement physical expression for numeric literal types
  • ARROW-6667 - [Python] remove cyclical object references in pyarrow.parquet
  • ARROW-6668 - [Rust][DataFusion] Implement CAST expression
  • ARROW-6669 - [Rust][DataFusion] Implement binary expression for physical plan
  • ARROW-6675 - [JS] Add scanReverse function to dataFrame and filteredDataframe
  • ARROW-6683 - [Python] Test for fastparquet <-> pyarrow cross-compatibility
  • ARROW-6725 - [CI] Disable 3rdparty fuzzit nightly builds
  • ARROW-6735 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6752 - [Go] implement Stringer for Null array
  • ARROW-6755 - [Release] Improvements to Windows release verification script
  • ARROW-6771 - [Packaging][Python] Missing pytest dependency from conda and wheel builds
  • PARQUET-1468 - [C++] Clean up ColumnReader/internal::RecordReader code duplication

Bug Fixes

  • ARROW-1184 - [Java] Dictionary.equals is not working correctly
  • ARROW-2041 - [Python] pyarrow.serialize has high overhead for list of NumPy arrays
  • ARROW-2248 - [Python] Nightly or on-demand HDFS test builds
  • ARROW-2317 - [Python] Fix C linkage warning with Cython
  • ARROW-2490 - [C++] Normalize input stream concurrency
  • ARROW-3176 - [Python] Overflow in Date32 column conversion to pandas
  • ARROW-3203 - [C++] Build error on Debian Buster
  • ARROW-3651 - [Python] Handle 'datetime' logical type when reconstructing pandas columns from custom metadata
  • ARROW-3652 - [Python][Parquet] Add unit test exhibiting that pandas.CategoricalIndex survives roundtrip to Parquet format
  • ARROW-3762 - [Python] Add large_memory unit test exercising BYTE_ARRAY overflow edge cases from ARROW-3762
  • ARROW-3933 - [C++][Parquet] Handle non-nullable struct children when reading Parquet file, better error messages
  • ARROW-4187 - [C++] Enable file-benchmark on Windows
  • ARROW-4746 - [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime
  • ARROW-4836 - [C++] Support Tell() on compressed streams
  • ARROW-4848 - [C++] Static libparquet not compiled with -DARROW_STATIC on Windows
  • ARROW-4880 - [Python] Rehabilitate ASV benchmark build scripts
  • ARROW-4883 - [Python] read_csv() returns garbage if given file object in text mode
  • ARROW-5028 - [Python] Avoid malformed ListArray types caused by reaching StringBuilder capacity when converting from Python sequence
  • ARROW-5072 - [Python] write_table fails silently on S3 errors
  • ARROW-5085 - [C++][Parquet][Python] Do not allow reading to dictionary type unless we have implemented support for it
  • ARROW-5086 - [Python][Parquet] Opt in to file memory-mapping when reading Parquet files rather than opting out
  • ARROW-5089 - [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size
  • ARROW-5103 - [Python] Segfault when using chunked_array.to_pandas on array different types (edge case)
  • ARROW-5125 - [Python] Round-trip extreme dates on windows
  • ARROW-5161 - [Python] Cannot convert struct type from Pandas object column
  • ARROW-5220 - [Python] Follow-up to improve error messages and docs for from_pandas schema argument
  • ARROW-5220 - [Python] Specified schema in from_pandas also includes the index
  • ARROW-5292 - [C++] Work around symbol visibility issues so building static libraries is not necessary when building unit tests on WIN32 platform
  • ARROW-5300 - [C++] Remove the ARROW_NO_DEFAULT_MEMORY_POOL macro
  • ARROW-5374 - [Python][C++] Improve ipc.read_record_batch docstring, fix IPC message type error messages generated in C++
  • ARROW-5414 - [C++] default to release build on windows
  • ARROW-5450 - [Python] Always return datetime.datetime in TimestampValue.as_py for units other than nanoseconds
  • ARROW-5471 - [C++][Gandiva] Array offset is ignored in Gandiva projector
  • ARROW-5522 - [Packaging][Documentation] Comments out of date in python/manylinux1/build_arrow.sh
  • ARROW-5525 - [C++] Add Continuous Fuzzing Integration setup with Fuzzit
  • ARROW-5560 - [C++][Plasma] Cannot create Plasma object after OutOfMemory error
  • ARROW-5562 - [C++][Parquet] Write negative zero or small epsilons as positive zero when computing Parquet statistics
  • ARROW-5630 - [C++][Parquet] Fix RecordReader accounting for repeated fields with non-nullable leaf
  • ARROW-5638 - [C++][CMake] Fixes for xcode project builds
  • ARROW-5651 - [Python] Fix Incorrect conversion from strided Numpy array
  • ARROW-5682 - [Python] Raise error when trying to convert non-string dtype to string
  • ARROW-5731 - [CI] Switch turbodbc branch for integration testing
  • ARROW-5753 - [Rust] Fix test failure in CI code coverage
  • ARROW-5772 - [GLib][Plasma][CUDA] Fix a bug that data can't be got
  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5776 - [Gandiva][Crossbow] Use commit id instead of fetch head.
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5817 - [Python] Use pytest mark for flight tests
  • ARROW-5823 - [Rust] CI scripts miss --all-targets cargo argument
  • ARROW-5824 - [Gandiva][C++] Fix decimal null literals.
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5848 - [C++] SO versioning schema after release 1.0.0
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5860 - [Java][Vector] Fix decimal utils to handle negative values.
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5870 - [C++][Docs] Refine source build instructions, do not tell people to install flex/bison if they don't need them
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5884 - [Java] Fix the get method of StructVector
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5894 - [Gandiva][C++] Added a linker script for libgandiva.so to restrict libstdc++ symbols.
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5910 - [Python] Support non-seekable streams in ipc.read_tensor, ipc.read_message, add Message.serialize_to method
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • ARROW-5923 - [C++][Parquet] Reword comment about UBSan and Int96 in writer.cc
  • ARROW-5925 - [Gandiva][C++] fix rounding in decimal to int cast
  • ARROW-5930 - [Python] Make Flight server init phase explicit
  • ARROW-5930 - [FlightRPC][Python] Disable Flight test causing segfault in Travis
  • ARROW-5935 - [C++] ArrayBuilder::type() should be kept accurate
  • ARROW-5946 - [Rust][DataFusion] Fix bug in projection push down logic
  • ARROW-5952 - [Python] fix conversion of chunked dictionary array with 0 chunks
  • ARROW-5959 - [CI] report branch+commit to fuzzit
  • ARROW-5960 - [C++] Fix Boost dependencies link order
  • ARROW-5963 - [R] R Appveyor job does not test changes in the C++ library
  • ARROW-5964 - [C++][Gandiva] Remove overflow check after rounding in BasicDecimal128::FromDouble
  • ARROW-5965 - [Python] Regression: segfault when reading hive table with v0.14
  • ARROW-5966 - [Python] Also use ChunkedStringBuilder when converting NumPy string types to Arrow StringType
  • ARROW-5968 - [Java] Remove duplicate Preconditions check in JDBC adapter
  • ARROW-5969 - [R] Fix R lint Failures
  • ARROW-5973 - [Java] Variable width vectors' get methods should return null when the underlying data is null
  • ARROW-5978 - [FlightRPC][Java] Properly release buffers in Flight integration client
  • ARROW-5989 - [C++] Accommodate openjdk-8 path search prefix
  • ARROW-5990 - [Python] add bounds check to RowGroupMetaData.column
  • ARROW-5992 - [C++][Python] Support String->Binary in Array::View. Add Python bindings for Array::View
  • ARROW-5993 - [Python] Reading a dictionary column from Parquet results in disproportionate memory usage
  • ARROW-5996 - [Java] Avoid potential resource leak in flight service
  • ARROW-5999 - [C++] decouple Iterator from ARROW_DATASETS
  • ARROW-6002 - [C++][Gandiva] test casting int64 to decimal
  • ARROW-6004 - [C++] Turn non-ignored empty CSV lines into null/empty values
  • ARROW-6005 - [C++] extend GetRecordBatchReader test to cover reading a single row group
  • ARROW-6006 - [C++] Do not fail to read empty IPC stream with schema having dictionary types
  • ARROW-6012 - [C++] Fall back on known Apache mirror for Thrift downloads
  • ARROW-6015 - [Python] Add note to python/README.md about installing Visual C++ Redistributable on Windows when using pip
  • ARROW-6016 - [Python] Fix get_library_dirs() when Arrow installed as a system package
  • ARROW-6029 - [R] Improve R docs on how to fix library version mismatch
  • ARROW-6032 - [C++] Ensure 64-bit pointer alignment in CountSetBits()
  • ARROW-6038 - [C++] Faster type equality
  • ARROW-6040 - [Java] Dictionary entries are required in IPC streams even when empty
  • ARROW-6046 - [C++] Do not write excess varbinary offsets in IPC messages from sliced BinaryArray
  • ARROW-6047 - [Rust] Rust nightly 1.38.0 builds failing
  • ARROW-6050 - [Java] Update out-of-date java/flight/README.md
  • ARROW-6054 - [Python] Fix the type erasion bug when serializing structured type ndarray.
  • ARROW-6058 - [C++][Parquet] Validate whole ColumnChunk raw data reads so that underlying filesystem issues are caught earlier
  • ARROW-6059 - [Python] Regression memory issue when calling pandas.read_parquet
  • ARROW-6060 - [C++] ChunkedBinaryBuilder should only grow when necessary, address runaway memory use in Parquet binary column read
  • ARROW-6061 - [C++] Add ARROW_JSON feature flag for configuring arrow builds without RapidJSON
  • ARROW-6066 - [Website] Fix blog post author header
  • ARROW-6067 - [Python] Fix failing large memory Python tests
  • ARROW-6068 - [C++] Allow passing Field instances to StructArray::Make
  • ARROW-6073 - [C++] Reset Decimal128Builder in Finish().
  • ARROW-6082 - [Python] check type of the index_type passed to pa.dictionary()
  • ARROW-6092 - [Python] Fix C++ arrow-python-test on Python 2.7
  • ARROW-6095 - [C++] Fix unit test build when only building static libraries, add cpp-static-only to tests.yml
  • ARROW-6108 - [C++] Workaround Windows CRT crash on invalid locale
  • ARROW-6116 - [C++][Gandiva] Fix bug in TimedTestFilterAdd2
  • ARROW-6117 - [Java] Fix the set method of FixedSizeBinaryVector
  • ARROW-6119 - [Python] PyArrow wheel import fails on Windows Python 3.7
  • ARROW-6120 - [C++] Forbid use of <iostream> in public header files
  • ARROW-6126 - [C++] Return error when an IPC stream terminates in the middle of receiving dictionaries
  • ARROW-6132 - [Python] validate result in ListArray.from_arrays
  • ARROW-6135 - [C++] Make KeyValueMetadata::Equals() order-insensitive
  • ARROW-6136 - [FlightRPC][Java] don't double-close response stream
  • ARROW-6145 - [Java] UnionVector created by MinorType#getNewVector could not keep field type info properly
  • ARROW-6148 - [Packaging] Improve aarch64 support
  • ARROW-6152 - [C++][Parquet] Add parquet::ColumnWriter::WriteArrow method, refactor
  • ARROW-6153 - [R] Address parquet deprecation warning
  • ARROW-6158 - [C++/Python] Validate child array types with type fields of StructArray
  • ARROW-6159 - [C++] Properly indent first line of PrettyPrint with Schema
  • ARROW-6160 - [Java] AbstractStructVector#getPrimitiveVectors fails to work with complex child vectors
  • ARROW-6166 - [Go] Fix index out of bounds panic when slicing a slice
  • ARROW-6167 - [R] macOS binary R packages on CRAN don't have arrow_available
  • ARROW-6168 - [C++] IWYU docker-compose job is broken
  • ARROW-6170 - [R] Faster docker-compose build
  • ARROW-6171 - [R][CI] Fix R library search path
  • ARROW-6174 - [C++] Validate chunks in ChunkedArray::Validate. Fix validation of sliced ListArray, values null checks
  • ARROW-6175 - [Java] Fix MapVector#getMinorType and extend AbstractContainerVector addOrGet complex vector API
  • ARROW-6178 - [Developer] Keep prompting for authors in merge script for multi-author PRs if given bad input
  • ARROW-6182 - [R] Add note to README about r-arrow conda installation
  • ARROW-6186 - [Packaging][deb] Add missing headers to libplasma-dev for Ubuntu 16.04
  • ARROW-6190 - [C++] Define and declare functions regardless of NDEBUG
  • ARROW-6193 - [GLib] Add missing require in test
  • ARROW-6200 - [Java] Method getBufferSizeFor in BaseRepeatedValueVector/ListVector not correct
  • ARROW-6202 - [Java] Add unit test for large resultsets
  • ARROW-6205 - [C++] ARROW_DEPRECATED warning when including io/interfaces.h
  • ARROW-6208 - [Java] Correct byte order before comparing in ByteFunctionHelpers
  • ARROW-6210 - [Java] remove equals API from ValueVector
  • ARROW-6211 - [Java] Remove dependency on RangeEqualsVisitor from ValueVector interface
  • ARROW-6214 - [R] Add R sanitizer docker image
  • ARROW-6215 - [Java] Fix case when ZeroVector is compared against other vector types
  • ARROW-6218 - [Java] Add UINT type test in integration to avoid potential overflow
  • ARROW-6223 - [C++] Configuration error with Anaconda Python 3.7.4
  • ARROW-6224 - [Python] fix deprecated usage of .data (previouly Column.data)
  • ARROW-6227 - [Python] Apply from_pandas option in pyarrow.array consistently across types
  • ARROW-6234 - [Java] ListVector hashCode() is not correct
  • ARROW-6241 - [Java] Failures on master
  • ARROW-6255 - [Rust] [Parquet] Cannot use any published parquet crate due to parquet-format breaking change
  • ARROW-6259 - [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings
  • ARROW-6263 - [Python] Use RecordBatch::Validate in RecordBatch.from_arrays. Normalize API vs. Table.from_arrays. Add record_batch factory function
  • ARROW-6266 - [Java] Resolve the ambiguous method overload in RangeEqualsVisitor
  • ARROW-6268 - [Java] Empty buffers to have a valid address.
  • ARROW-6269 - [C++] check decimal precision in IPC code
  • ARROW-6270 - [C++] check buffer_index bounds in IpcComponentSource.GetBuffer
  • ARROW-6290 - [Rust][DataFusion] Fix bug in type coercion rule
  • ARROW-6291 - [C++] Do not override ARROW_PARQUET if other PARQUET options are enabled
  • ARROW-6293 - [Rust] datafusion 0.15.0-SNAPSHOT error
  • ARROW-6301 - [C++][Python] Prevent ExtensionType-related race condition in Python process teardown by exposing shared_ptr to global "ExtensionTypeRegistry"
  • ARROW-6302 - [C++][Parquet][Python] Restore ordered type property when reading dictionary type with serialized Arrow schema
  • ARROW-6309 - [C++][Parquet] Stop needless static linking
  • ARROW-6323 - [R] Expand file paths when passing to readers
  • ARROW-6325 - [Python] fix conversion of strided boolean arrays
  • ARROW-6330 - [C++] Include missing API headers
  • ARROW-6332 - [Java][C++][Gandiva] Misc fixes for varwidth vector allocation.
  • ARROW-6339 - [Python] Raise ValueError when accessing unset statistics
  • ARROW-6343 - [Java][Vector] Fix allocation helper.
  • ARROW-6344 - [C++][Gandiva] Handle multibyte characters in substring function
  • ARROW-6345 - [C++][Python] "ordered" flag seemingly not taken into account when comparing DictionaryType values for equality
  • ARROW-6348 - [R] arrow::read_csv_arrow namespace error when package not loaded
  • ARROW-6354 - [C++] Fix failing build when ARROW_PARQUET=OFF
  • ARROW-6363 - [R] segfault in Table__from_dots with unexpected schema
  • ARROW-6364 - [R] Handling unexpected input to time64() et al:
  • ARROW-6369 - [C++] Handle Array.to_pandas case for type=list<bool>
  • ARROW-6371 - [Doc] Row to columnar conversion example mentions arrow::Column in comments
  • ARROW-6372 - [Rust][Datafusion] Casting from Un-signed to Signed Integers not supported
  • ARROW-6376 - [Developer] Use target ref of PR when merging instead of hard-coding "master"
  • ARROW-6387 - [Archery] Errors with make
  • ARROW-6392 - [FlightRPC][Python] check type of list_flights result
  • ARROW-6395 - [Python] Bug when using bool arrays with stride greater than 1
  • ARROW-6406 - [C++] Fix jemalloc URL for offline build in thirdparty/versions.txt
  • ARROW-6411 - [Python][Parquet] Improve performance of DictEncoder::PutIndices
  • ARROW-6412 - [C++] Improve TCP port allocation in tests
  • ARROW-6418 - [C++][Plasma] Remove cmake project directive for plasma
  • ARROW-6423 - [C++] Fix crash when trying to instantiate Snappy CompressedOutputStream
  • ARROW-6424 - [C++] Fix IPC fuzzing test name
  • ARROW-6425 - [C++] ValidateArray fail for slice of list array
  • ARROW-6428 - [CI][Crossbow] Nightly turbodbc job fails
  • ARROW-6430 - [CI][Crossbow] Nightly R docker job fails
  • ARROW-6431 - [Python] Test suite fails without pandas installed
  • ARROW-6432 - [CI][Crossbow] Remove alpine nightly crossbow jobs
  • ARROW-6433 - [Java][CI] Fix java docker image
  • ARROW-6434 - [CI][Crossbow] Nightly HDFS integration job fails
  • ARROW-6435 - [Python] Use pandas null coding consistently on List and Struct types
  • ARROW-6440 - [Packaging][deb] Follow plasma-store-server name change
  • ARROW-6441 - [Packaging][RPM] Follow plasma-store-server name change
  • ARROW-6442 - [CI][Crossbow] Nightly gandiva jar osx build fails
  • ARROW-6443 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-6444 - [CI][Crossbow] Nightly conda Windows builds fail (time out)
  • ARROW-6446 - [OSX][Python][Wheel] Turn off ORC feature in the wheel building scripts
  • ARROW-6449 - [R] io "tell()" methods are inconsistently named and untested
  • ARROW-6457 - [C++] Always set CMAKE_BUILD_TYPE if it is not defined
  • ARROW-6461 - [Java] Prevent EchoServer from closing the client socket after writing
  • ARROW-6472 - [Java] ValueVector#accept may has potential cast exception
  • ARROW-6476 - [Java][CI] Fix java docker build script
  • ARROW-6478 - [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues
  • ARROW-6481 - [C++] Avoid copying large ConvertOptions
  • ARROW-6488 - [Python] fix equality with pyarrow.NULL to return NULL
  • ARROW-6492 - [Python] Handle pandas_metadata created by fastparquet with missing field_name
  • ARROW-6502 - [GLib][CI] Pin gobject-introspection gem to 3.3.7
  • ARROW-6506 - [C++] Fix validation of ExtensionArray with struct storage type
  • ARROW-6509 - [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure
  • ARROW-6509 - [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily
  • ARROW-6520 - [Python] More consistent handling of specified schema when creating Table
  • ARROW-6522 - [Python] Fix failing pandas tests on older pandas / older python
  • ARROW-6530 - [CI][Crossbow][R] Nightly R job doesn't install all dependencies
  • ARROW-6550 - [C++] Filter expressions PR failing manylinux package builds
  • ARROW-6551 - [Python] Dask Parquet integration test failure
  • ARROW-6552 - [C++] boost::optional in STL test fails compiling in gcc 4.8.2
  • ARROW-6560 - [Python] Fix nopandas integration tests
  • ARROW-6561 - [Python] Fix python tests to pass on pandas master
  • ARROW-6562 - [GLib] Fix returning wrong sliced data of GArrowBuffer
  • ARROW-6564 - [Python] Do not require pandas for invoking Array.array
  • ARROW-6565 - [Rust][DataFusion] Fix intermittent test failure
  • ARROW-6568 - [C++] ChunkedArray constructor needs type when chunks is empty
  • ARROW-6572 - [C++] Fix Parquet decoding returning uninitialized data
  • ARROW-6573 - [Python] Add test case to probe additional behavior in schema-data mismatch in Table.from_pydict
  • ARROW-6576 - [R] Fix sparklyr integration tests
  • ARROW-6586 - [Python][Packaging] Windows wheel builds failing with "DLL load failure"
  • ARROW-6597 - [Python] Sanitize Python datetime handling
  • ARROW-6618 - [Python] Fix read_message() segfault on end of stream
  • ARROW-6620 - [Python][CI] pandas-master build failing due to removal of "to_sparse" method
  • ARROW-6622 - [R] Normalize paths for filesystem API on Windows
  • ARROW-6623 - [CI][Python] Dask docker integration test broken perhaps by statistics-related change
  • ARROW-6639 - [Packaging][RPM] Add support for CentOS 7 on aarch64
  • ARROW-6640 - [C++] Do not reset buffer_pos_ in BufferedInputStream/OutputStream when enlarging buffer
  • ARROW-6641 - [C++] Remove Deprecated WriteableFile warning
  • ARROW-6642 - [Python] Link parent objects in Parquet's metadata and statistics objects
  • ARROW-6651 - Fix conda R job
  • ARROW-6652 - [Python] Fix ChunkedArray.to_pandas to retain timezone
  • ARROW-6652 - [Python] Fix Array.to_pandas to retain timezone
  • ARROW-6660 - [Rust][DataFusion] Minor docs update for 0.15.0 release
  • ARROW-6670 - [CI][R] Fix fixes for R nightly jobs
  • ARROW-6674 - [Python] Fix or ignore the test warnings
  • ARROW-6677 - [FlightRPC][C++] Document Flight in C++
  • ARROW-6678 - [C++][Parquet] Binary data stored in Parquet metadata must be base64-encoded to be UTF-8 compliant
  • ARROW-6679 - [RELEASE] Add license info for the autobrew scripts
  • ARROW-6682 - [C#] Ensure file footer block lengths are always 8 byte aligned.
  • ARROW-6687 - [Rust][DataFusion] Add regression tests for np.nan parquet file
  • ARROW-6687 - [Rust][DataFusion] Bug fix in DataFusion Parquet reader
  • ARROW-6701 - [C++][R] Lint failing on R cpp code
  • ARROW-6703 - [Packaging][Linux] Restore ARROW_VERSION environment variable
  • ARROW-6705 - [Rust][DataFusion] README has invalid github URL
  • ARROW-6709 - [JAVA] Jdbc adapter currentIndex should increment when va…
  • ARROW-6714 - [R] Fix untested RecordBatchWriter case
  • ARROW-6716 - [Rust] Bump nightly to nightly-2019-09-25 to fix CI
  • ARROW-6748 - [RUBY] gem compilation error
  • ARROW-6751 - [CI] ccache doesn't cache on Travis-CI
  • ARROW-6760 - [C++] JSON: improve error message when column changed type
  • ARROW-6773 - [C++] Filter kernel returns invalid data when filtering with an Array slice
  • ARROW-6796 - Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table
  • ARROW-7112 - Wrong contents when initializinga pyarrow.Table from boolean DataFrame
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files
  • PARQUET-1631 - [C++] ParquetInputWrapper::GetSize returns Tell
  • PARQUET-1640 - [C++] Fix crash in parquet-encoding-benchmark
ptaylor
published 0.14.1 •

Changelog

Source

Apache Arrow 0.14.1 (2019-07-22)

Bug Fixes

  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5791 - [C++] Fix infinite loop with more the 32768 columns.
  • ARROW-5816 - [Release] Do not curl in background in verify-release-candidate.sh
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files

New Features and Improvements

  • ARROW-5101 - [Packaging] Avoid bundling static libraries in Windows conda packages
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5785 - [Rust] Make the datafusion cli dependencies optional
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicated known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5958 - [Python] Link zlib statically in the wheels
kou
published 0.14.0 •

Changelog

Source

Apache Arrow 0.14.0 (2019-07-04)

New Features and Improvements

  • ARROW-258 - [Format] clarify definition of Buffer in context of RPC, IPC, File
  • ARROW-653 - [Python / C++] Add debugging function to print an array's buffer contents in hexadecimal
  • ARROW-767 - [C++] Filesystem abstraction
  • ARROW-835 - [Format][C++][Java] Create a new Duration type
  • ARROW-840 - [Python] Expose extension types
  • ARROW-973 - [Website] Add FAQ page
  • ARROW-1012 - [C++] Configurable batch size for parquet RecordBatchReader
  • ARROW-1207 - [C++] Implement MapArray, MapBuilder, MapType classes, and IPC support
  • ARROW-1261 - [Java] Add MapVector with reader and writer
  • ARROW-1278 - [Integration] Adding integration tests for fixed_size_list
  • ARROW-1279 - [Integration] Enable MapType integration tests
  • ARROW-1280 - [C++] add fixed size list type
  • ARROW-1349 - [Packaging] Provide APT and Yum repositories
  • ARROW-1496 - [JS] Upload coverage data to codecov.io
  • ARROW-1558 - [C++] Implement boolean filter (selection) kernel, rename comparison kernel-related functions
  • ARROW-1587 - [Format] Add metadata for user-defined logical types
  • ARROW-1774 - [C++] Add Array::View()
  • ARROW-1833 - [Java] Add accessor methods for data buffers that skip null checking
  • ARROW-1957 - [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
  • ARROW-1983 - [C++][Parquet] Add AppendRowGroups and WriteMetaDataFile methods
  • ARROW-2057 - [Python] Expose option to configure data page size threshold in parquet.write_table
  • ARROW-2102 - [C++] Implement Take kernel
  • ARROW-2103 - [C++] Implement take kernel functions - string/binary value type
  • ARROW-2104 - [C++] take kernel functions for nested types
  • ARROW-2105 - [C++] Implement take kernel functions - properly handle special indices
  • ARROW-2186 - [C++] Clean up architecture specific compiler flags
  • ARROW-2217 - [C++] Add option to use dynamic linking for compression library dependencies
  • ARROW-2298 - [Python] Add unit tests to assert that float64 with NaN values can be safely coerced to integer types when converting from pandas
  • ARROW-2412 - [Integration] Add nested dictionary test case, skipped for now
  • ARROW-2467 - [Rust] Add generated IPC code
  • ARROW-2517 - [Java] Add list<decimal> writer
  • ARROW-2618 - [Rust] Bitmap constructor should accept for flag for default state (0 or 1)
  • ARROW-2667 - [C++/Python] Add pandas-like take method to Array
  • ARROW-2707 - [C++] Add Table::Slice
  • ARROW-2709 - [Python] write_to_dataset poor performance when splitting
  • ARROW-2730 - [C++] Set up CMAKE_C_FLAGS more thoughtfully instead of using CMAKE_CXX_FLAGS
  • ARROW-2796 - [C++] Simplify version script used for linking
  • ARROW-2818 - [Python] Better error message when trying to convert sparse pandas data to arrow Table
  • ARROW-2835 - [C++] Make file position undefined after ReadAt()
  • ARROW-2969 - [R] Convert between StructArray and "nested" data.frame column containing data frame in each cell
  • ARROW-2981 - [C++] improve clang-tidy usability
  • ARROW-2984 - [JS] Refactor release verification script to share code with main source release verification script
  • ARROW-3040 - [Go] add support for comparing Arrays
  • ARROW-3041 - [Go] add support for TimeArray
  • ARROW-3052 - [C++] Detect Apache ORC C++ libraries in system/conda toolchain, add to conda requirements
  • ARROW-3087 - [C++] Implement Compare filter kernel
  • ARROW-3144 - [C++/Python] Move "dictionary" member from DictionaryType to ArrayData to allow for variable dictionaries
  • ARROW-3150 - [Python] Enable Flight in Python wheels for Linux and Windows
  • ARROW-3166 - [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp
  • ARROW-3191 - [Java] Make ArrowBuf work with arbitrary underlying memory
  • ARROW-3200 - [C++] Support dictionaries in Flight streams
  • ARROW-3290 - [C++] Toolchain support for secure gRPC
  • ARROW-3294 - [C++][Flight] Support Flight on Windows
  • ARROW-3314 - [R] Set -rpath using pkg-config when building
  • ARROW-3330 - [C++] Spawn multiple Flight performance servers in flight-benchmark to test parallel get performance
  • ARROW-3419 - [C++] Run include-what-you-use checks as nightly build
  • ARROW-3459 - [C++][Gandiva] Add support for variable length output vectors
  • ARROW-3475 - [C++] Allow builders to finish to the corresponding array type
  • ARROW-3570 - [Packaging] Don't bundle test data files with python wheels
  • ARROW-3572 - [Crossbow] Raise more helpful exception if Crossbow queue has an SSH origin URL
  • ARROW-3671 - [Go] implement MonthInterval and DayTimeInterval
  • ARROW-3676 - [Go] implement Decimal128 array
  • ARROW-3679 - [Go] implement read/write IPC for Decimal128
  • ARROW-3680 - [Go] implement Float16 array
  • ARROW-3686 - [Python] support masked arrays in pa.array
  • ARROW-3702 - [R] POSIXct mapped to DateType not TimestampType?
  • ARROW-3714 - [CI] Run RAT checks in pre-commit hooks
  • ARROW-3729 - [C++][Parquet] Use logical annotations in Arrow Parquet reader/writer
  • ARROW-3732 - [R] Add functions to write RecordBatch or Schema to Message value, then read back
  • ARROW-3758 - [R] Build R library and dependencies on Windows in Appveyor CI
  • ARROW-3759 - [R][CI] Build and test (no libarrow) on Windows in Appveyor
  • ARROW-3767 - [C++] Add cast from null to any other type
  • ARROW-3780 - [R] : Failed to fetch data: invalid data when collecting int16
  • ARROW-3791 - [C++ / Python] Add boolean type inference to the CSV parser
  • ARROW-3794 - [R] : Consider mapping INT8 to integer() not raw()
  • ARROW-3804 - [R] Support older versions of R runtime
  • ARROW-3810 - [R] type= argument for Array and ChunkedArray
  • ARROW-3811 - [R] : Support inferring data.frame column as StructArray in array constructors
  • ARROW-3814 - [R] RecordBatch$from_arrays()
  • ARROW-3815 - [R] : refine record batch factory
  • ARROW-3848 - [R] allow nbytes to be missing in RandomAccessFile$Read()
  • ARROW-3897 - [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file
  • ARROW-3904 - [C++/Python] Validate scale and precision of decimal128 type
  • ARROW-4013 - [Docs][C++] Add how to build on MSYS2
  • ARROW-4020 - [Release] Add a post release script to remove RC
  • ARROW-4047 - [Python] Document use of int96 timestamps and options in Parquet docs
  • ARROW-4086 - [Java] Add apis to debug memory alloc failures
  • ARROW-4121 - [C++] Refactor memory allocation from InvertKernel
  • ARROW-4159 - [C++] Build with -Wdocumentation when using clang and BUILD_WARNING_LEVEL=CHECKIN
  • ARROW-4194 - [Format][Docs] Remove duplicated / out-of-date logical type information from documentation
  • ARROW-4302 - [C++] Add OpenSSL to C++ build toolchain (#4384)
  • ARROW-4337 - [C#] Implemented Fluent API for building arrays and record batches
  • ARROW-4343 - [C++] Add docker-compose test for gcc 4.8 / Ubuntu 14.04 (Trusty), expand Xenial/16.04 Dockerfile to test Flight
  • ARROW-4356 - [CI] Add integration (docker) test for turbodbc
  • ARROW-4369 - [Packaging] Release verification script should test linux packages via docker
  • ARROW-4452 - [Python] Serialize sparse torch tensors
  • ARROW-4453 - [Python] Create Cython wrappers for SparseTensor
  • ARROW-4467 - [Rust][DataFusion] Create a REPL & Dockerfile for DataFusion
  • ARROW-4503 - [C#] Eliminate allocations in ArrowStreamReader when reading from a Stream
  • ARROW-4504 - [C++] Reduce number of C++ unit test executables from 128 to 82
  • ARROW-4505 - [C++] adding pretty print for dates, times, and timestamps
  • ARROW-4566 - [Flight] Add option to run Flight benchmark against separate server
  • ARROW-4596 - [Rust][DataFusion] Implement COUNT
  • ARROW-4622 - [C++][Python] MakeDense and MakeSparse in UnionArray should accept a vector of Field
  • ARROW-4625 - [Flight][Java] Add method to await Flight server termination in Java
  • ARROW-4626 - [Flight] Add application-defined metadata to DoGet/DoPut
  • ARROW-4627 - [Flight] Add application metadata field to DoPut
  • ARROW-4701 - [C++] Add JSON chunker benchmarks
  • ARROW-4702 - [C++] Update dependency versions
  • ARROW-4708 - [C++] add multithreaded json reader
  • ARROW-4708 - [C++] refactoring JSON parser to prepare for multithreaded impl
  • ARROW-4714 - [C++][JAVA] Providing JNI interface to Read ORC file via Arrow C++
  • ARROW-4717 - [C#] Consider exposing ValueTask instead of Task
  • ARROW-4719 - [C#] Implement ChunkedArray, Column and Table in C#
  • ARROW-4741 - [Java] Add missing type javadoc and enable checkstyle
  • ARROW-4787 - [C++] Add support for Null in MemoTable and related kernels
  • ARROW-4788 - [C++] Less verbose API for constructing StructArray
  • ARROW-4800 - [C++] Introduce a Result<T> class
  • ARROW-4805 - [Rust] Write temporal arrays to CSV
  • ARROW-4806 - [Rust] Temporal array casts
  • ARROW-4824 - [Python] Fix error checking in read_csv()
  • ARROW-4827 - [C++] Implement benchmark comparison
  • ARROW-4847 - [Python] Add pyarrow.table factory function
  • ARROW-4904 - [C++] Move implementations in arrow/ipc/test-common.h into libarrow_testing
  • ARROW-4911 - [R] Progress towards completing windows support
  • ARROW-4912 - [C++] add method for easy renaming of a Table's columns
  • ARROW-4913 - [Java][Memory] Add additional methods for observing allocations.
  • ARROW-4945 - [Flight] Enable integration tests in Travis
  • ARROW-4956 - [C#] Allow ArrowBuffers to wrap external Memory
  • ARROW-4959 - [C++][Gandiva][Crossbow] Gandiva crossbow packaging changes.
  • ARROW-4968 - [Rust] Assert that struct array field types match data in…
  • ARROW-4971 - [Go] Add type equality test function
  • ARROW-4972 - [Go] implement ArrayEquals
  • ARROW-4973 - [Go] implement ArraySliceEqual
  • ARROW-4974 - [Go] implement ArrayApproxEqual
  • ARROW-4990 - [C++] Support Array-Array comparison
  • ARROW-4993 - [C++] Add simple build configuration summary
  • ARROW-5000 - [Python] Fix 'SO' DeprecationWarning in setup.py
  • ARROW-5007 - [C++] Remove DCHECK in intrinsic headers
  • ARROW-5020 - [CI] Split Gandiva-related packages into separate .yml file
  • ARROW-5027 - [Python] Python bindings for JSON reader
  • ARROW-5037 - [Rust] [DataFusion] Refactor aggregate module
  • ARROW-5038 - [Rust][DataFusion] Implement AVG aggregate function
  • ARROW-5039 - [Rust][DataFusion] Re-implement CAST support
  • ARROW-5040 - [C++] ArrayFromJSON can't parse Timestamp from strings
  • ARROW-5045 - [Rust] Code coverage silently failing in CI
  • ARROW-5053 - [Rust][DataFusion] Use ARROW_TEST_DATA env var
  • ARROW-5054 - [Release][Flight] Test Flight in Linux/macOS release verification scripts
  • ARROW-5056 - [Packaging] Adjust conda recipes to use ORC conda-forge package on unix systems
  • ARROW-5061 - [Release] Improve 03-binary performance
  • ARROW-5062 - [Java][FlightRPC] Shade com.google.guava usage in Flight
  • ARROW-5063 - [FlightRPC][Java] Test that Flight client connections are independent
  • ARROW-5064 - [Release] Pass PKG_CONFIG_PATH to glib in the verification script
  • ARROW-5066 - [Integration] Add flags to enable/disable implementations in integration/integration_test.py
  • ARROW-5071 - [Archery] Implement running benchmark suite
  • ARROW-5076 - [Release] Improve post binary upload performance
  • ARROW-5077 - [Rust] Change Cargo.toml to use release versions
  • ARROW-5078 - [Documentation] Sphinx is failed by RemovedInSphinx30Warning
  • ARROW-5079 - [Release] Add a script that releases C# package
  • ARROW-5080 - [Release] Add a script that releases Rust packages
  • ARROW-5081 - [C++] Use PATH_SUFFIXES when searching for dependencies
  • ARROW-5083 - [Developer] PR merge script improvements: set already-released Fix Version, display warning when no components set
  • ARROW-5088 - [C++] Only add -Werror in debug builds. Add C++ documentation about compiler warning levels
  • ARROW-5091 - [Flight] Rename FlightGetInfo message to FlightInfo
  • ARROW-5093 - [Packaging] Add support for selective binary upload
  • ARROW-5094 - [Packaging] Add APT/Yum verification scripts
  • ARROW-5102 - [C++] Reduce header dependencies
  • ARROW-5108 - [Go] implement reading primitive arrays from Arrow file
  • ARROW-5109 - [Go] implement reading binary/string arrays from Arrow file
  • ARROW-5110 - [Go] implement reading struct arrays from Arrow file
  • ARROW-5111 - [Go] implement reading list arrays from Arrow file
  • ARROW-5112 - [Go] implement writing IPC Arrow stream/file
  • ARROW-5113 - [C++] Fix DoPut with dictionary arrays, add tests
  • ARROW-5115 - [JS] Add Vector Builders and high-level stream primitives
  • ARROW-5116 - [Rust] move kernel related files under compute/kernels
  • ARROW-5124 - [C++] Add support for Parquet in MinGW build
  • ARROW-5126 - [Rust][Parquet] Convert parquet column desc to arrow data type
  • ARROW-5127 - [Rust][Parquet] Add page iterator.
  • ARROW-5136 - [Flight] Call options
  • ARROW-5137 - [Flight] Implement auth API
  • ARROW-5145 - [C++] More input validation in release mode
  • ARROW-5150 - [Ruby] Add Arrow::Table#raw_records
  • ARROW-5155 - [GLib][Ruby] Add support for building union arrays from data type
  • ARROW-5157 - [Website] Add MATLAB to powered by Apache Arrow website
  • ARROW-5162 - [Rust][Parquet] Rename mod reader to arrow.
  • ARROW-5163 - [Gandiva] Cast timestamp/date are incorrectly evaluating year 0097 to 1997
  • ARROW-5164 - [Gandiva][C++] Introduce murmur32 for 32 bit types.
  • ARROW-5165 - [Python] update dev installation docs for --build-type + validate in setup.py
  • ARROW-5168 - [GLib] Add garrow_array_take()
  • ARROW-5171 - [C++] Use LESS instead of LOWER in compare enum
  • ARROW-5172 - [Go] implement reading fixed-size binary arrays from Arrow file
  • ARROW-5178 - [Python] Add Table.from_pydict()
  • ARROW-5179 - [Python] Return plain dicts, not OrderedDict, on Python 3.7+
  • ARROW-5185 - [C++] Add support for Boost with CMake configuration file
  • ARROW-5187 - [Rust] Add ability to convert StructArray to RecordBatch
  • ARROW-5188 - [Rust] Add temporal types to struct builders
  • ARROW-5189 - [Rust][Parquet] Format / display individual fields within a parquet row
  • ARROW-5190 - [R] : Discussion: tibble dependency in R package
  • ARROW-5191 - [Rust] Expose CSV and JSON reader schemas
  • ARROW-5203 - [GLib] Add support for Compare filter
  • ARROW-5204 - [C++] Improve builder performance
  • ARROW-5212 - [Go] Support reserve for the data buffer in the BinaryBuilder
  • ARROW-5218 - [C++] Improve build when third-party library locations are specified
  • ARROW-5219 - [C++] Build protobuf_ep in parallel when using Ninja build
  • ARROW-5222 - [Python] Revise pyarrow installation instructions for macOS
  • ARROW-5225 - [Java] Improve performance of BaseValueVector#getValidityBufferSizeFromCount
  • ARROW-5226 - [Gandiva] Add cmp functions for decimals
  • ARROW-5238 - [Python] Convert arguments to pyarrow.dictionary
  • ARROW-5241 - [Python] expose option to disable writing statistics to parquet file
  • ARROW-5250 - [Java] Add javadoc comments to public methods, remove style check suppression.
  • ARROW-5252 - [C++] Use standard-compliant std::variant backport
  • ARROW-5256 - [C++] Add support for LLVM 7.1
  • ARROW-5257 - [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo
  • ARROW-5258 - [C++/Python] Collect file metadata of dataset pieces
  • ARROW-5261 - [C++] Add missing scalar defintions for Intervals
  • ARROW-5262 - [Python] Fix typo
  • ARROW-5264 - [Java] Allow enabling/disabling boundary checking by environmental variable
  • ARROW-5266 - [Go] implement read/write IPC for Float16
  • ARROW-5268 - [GLib] Add GArrowJSONReader
  • ARROW-5269 - [C++][Archery] Mark relevant benchmarks as regression
  • ARROW-5275 - [C++] Generic filesystem tests
  • ARROW-5281 - [Rust] Extract DataPageBuilder to test common
  • ARROW-5284 - [Rust] Replace libc with std::alloc for memory allocation
  • ARROW-5286 - [Python] support struct type in from_pandas
  • ARROW-5288 - [Documentation] Enhance the contribution guidelines page
  • ARROW-5289 - [C++] Move arrow/util/concatenate* to arrow/array
  • ARROW-5290 - [Java] Provide a flag to enable/disable null-checking in vector's get methods
  • ARROW-5291 - [Python] Add wrapper for take kernel on Array
  • ARROW-5298 - [Rust] Add debug implementation for buffer data.
  • ARROW-5299 - [C++] ListArray comparison is incorrect
  • ARROW-5309 - [Python] clarify that Schema.append returns new object
  • ARROW-5311 - [C++] use more specific error status types in take
  • ARROW-5313 - [Format] Comments on Field table are a bit confusing
  • ARROW-5317 - [Rust][Parquet] impl IntoIterator for SerializedFileReader
  • ARROW-5319 - [C++][CI][travis skip]
  • ARROW-5321 - [Gandiva][C++] add isnull impl for string types
  • ARROW-5323 - [CI][skip travis]
  • ARROW-5328 - [R] Add shell scripts to do a full package rebuild and test locally
  • ARROW-5329 - [MATLAB] Add support for building MATLAB interface to Feather directly within MATLAB
  • ARROW-5334 - [C++] Ensure all type classes end with "Type"
  • ARROW-5335 - [Python] Raise exception on variable dictionaries in conversion to Python/pandas
  • ARROW-5339 - [C++] Add jemalloc URL to thirdparty/versions.txt so download_dependencies.sh gets it
  • ARROW-5341 - [C++][Documentation] developers/cpp.rst should mention documentation warnings
  • ARROW-5342 - [Format] Formalize "extension types" in Arrow protocol metadata
  • ARROW-5346 - [C++] Revert changed to vendored datetime library
  • ARROW-5349 - [C++][Parquet] Add method to set file path in a parquet::FileMetaData instance
  • ARROW-5361 - [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144
  • ARROW-5363 - [GLib] Fix coding styles
  • ARROW-5364 - [C++] Use ASCII rather than UTF-8 in BuildUtils.cmake comment
  • ARROW-5365 - [C++][CI] Enable ASAN/UBSAN in CI
  • ARROW-5368 - [C++] Disable jemalloc by default with MinGW
  • ARROW-5369 - [C++] Add support for glog on Windows
  • ARROW-5370 - [C++] Use system uriparser if available
  • ARROW-5372 - [GLib] Add support for null/boolean values CSV read option
  • ARROW-5378 - [C++] Local filesystem implementation
  • ARROW-5384 - [Go] implement FixedSizeList array
  • ARROW-5389 - [C++] Add Temporary Directory facility
  • ARROW-5392 - [C++][CI] Disable static build with MinGW on AppVeyor
  • ARROW-5393 - [R] Add tests and example for read_parquet()
  • ARROW-5395 - [C++] Utilize stream EOS in File format
  • ARROW-5396 - [JS] Support files and streams with no record batches
  • ARROW-5401 - [CI][skip appveyor]
  • ARROW-5404 - [C++] force usage of nonstd::sv_lite::string_view instead of std::string_view
  • ARROW-5407 - [C++] Allow building only integration test targets
  • ARROW-5413 - [C++] Skip UTF8 BOM in CSV files
  • ARROW-5415 - [Release] Release script should update R version everywhere
  • ARROW-5416 - [Website] Add Homebrew to project installation page
  • ARROW-5418 - [CI][R] Run code coverage and report to codecov.io
  • ARROW-5420 - [Java] Implement or remove getCurrentSizeInBytes in Variab…
  • ARROW-5427 - [Python] pandas conversion preserve_index=True to force RangeIndex serialization
  • ARROW-5428 - [C++] Add option to set "read extent" in arrow::io::BufferedInputStream
  • ARROW-5429 - [Java] Provide alternative buffer allocation policy
  • ARROW-5432 - [Python] Add NativeFile.read_at()
  • ARROW-5433 - [C++][Parquet] Improve parquet-reader columns information, strip trailing whitespace from test case
  • ARROW-5434 - [Memory][Java] Introduce wrappers for backward compatibility.
  • ARROW-5436 - [Python] parquet.read_table add filters keyword
  • ARROW-5438 - [JS] EOS bytes for sequential readers
  • ARROW-5441 - [C++] Implement FindArrowFlight.cmake
  • ARROW-5442 - [Website] Clarify what makes a release artifact "official"
  • ARROW-5443 - [Crossbow] Turn parquet build off for Gandiva.
  • ARROW-5447 - [Ruby] Ensure flushing test gz file
  • ARROW-5449 - [C++] Test extended-length paths on Windows
  • ARROW-5451 - [C++][Gandiva] Support cast/round functions for decimal
  • ARROW-5452 - [R] Add API documentation website (pkgdown)
  • ARROW-5461 - [Java] Add micro-benchmarks for Float8Vector and allocators
  • ARROW-5463 - [Rust] Add AsRef trait for Buffer.
  • ARROW-5464 - [Archery] Fix default diff --benchmark-filter
  • ARROW-5465 - [Crossbow] Support writing submitted job definition yaml to a file
  • ARROW-5466 - [Java] Dockerize Java builds in Travis CI, run multiple JDKs in single entry
  • ARROW-5467 - [Go] implement read/write IPC for Time32/64 arrays
  • ARROW-5468 - [Go] implement read/write IPC for Timestamp arrays
  • ARROW-5469 - [Go] implement read/write IPC for Date32/64 arrays
  • ARROW-5470 - [CI] Fix Travis-CI R job that broke with the local fs patch
  • ARROW-5472 - [Development] Add warning to PR merge tool if no JIRA component is set
  • ARROW-5474 - [C++] Document Boost 1.58 as minimum supported version, add docker-compose entry for it, fix broken cpp/Dockerfile* builds
  • ARROW-5475 - [Python] Add Python binding for arrow::Concatenate
  • ARROW-5476 - [Java][Memory] Fix Netty Arrow Buf.
  • ARROW-5477 - [C++] Check required RapidJSON version
  • ARROW-5478 - [Packaging] Drop Ubuntu 14.04 support
  • ARROW-5481 - [GLib] Add "error" parameter document
  • ARROW-5485 - [C++] Install libraries from googletest_ep into build output directory on non-Windows platforms.
  • ARROW-5485 - [Crossbow] Disable unit tests in Gandiva macOS crossbow job until underlying issue resolved
  • ARROW-5486 - [GLib] Add binding of gandiva::FunctionRegistry and related things
  • ARROW-5488 - [R] Workaround when C++ lib not available
  • ARROW-5490 - [C++] Remove ARROW_BOOST_HEADER_ONLY
  • ARROW-5491 - [C++] Remove unecessary semicolons following MACRO definitions
  • ARROW-5492 - [R] Add "col_select" argument to read_* functions to read subset of columns
  • ARROW-5495 - [C++] Update some dependency URLs from http to https
  • ARROW-5496 - [R][CI] Fix relative paths in R codecov.io reporting
  • ARROW-5498 - [C++][CI] Fix Flatbuffers related error with MinGW
  • ARROW-5499 - [R] Alternate bindings for when libarrow is not found
  • ARROW-5500 - [R] read_csv_arrow() signature should match readr::read_csv()
  • ARROW-5503 - [R] : add read_json()
  • ARROW-5504 - [R] : move use_threads argument to global option
  • ARROW-5509 - [R] Add basic write_parquet
  • ARROW-5511 - [Packaging] Enable Flight in Conda packages
  • ARROW-5512 - [C++] Rough API skeleton for C++ Datasets API / framework
  • ARROW-5513 - [Java] Refactor method name for getstartOffset to use camel case
  • ARROW-5516 - [Python][Documentation] Development page for pyarrow has a missing dependency in using pip
  • ARROW-5518 - [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear
  • ARROW-5524 - [C++] Turn off PARQUET_BUILD_ENCRYPTION in CMake if OpenSSL not found (#4494)
  • ARROW-5526 - [GitHub] Add more prominent notice to ISSUE_TEMPLATE.md to direct bug reports to JIRA
  • ARROW-5529 - [Flight] Allow serving with multiple TLS certificates
  • ARROW-5531 - [Python] Implement Array.from_buffers for varbinary and nested types, add DataType.num_buffers property
  • ARROW-5533 - [C++][Plasma] make plasma client thread safe
  • ARROW-5534 - [GLib] Add garrow_table_concatenate()
  • ARROW-5535 - [GLib] Add garrow_table_slice()
  • ARROW-5537 - [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
  • ARROW-5538 - [C++] Restrict minimum OpenSSL version to 1.0.2
  • ARROW-5541 - [R] : cast from negative int32 to uint32 and uint64 are now safe
  • ARROW-5544 - [Archery] Don't return non-zero on regressions
  • ARROW-5545 - [C++][Docs] Clarify expectation of UTC values for timestamps with time zones
  • ARROW-5547 - [C++][FlightRPC] Support pkg-config for Arrow Flight
  • ARROW-5552 - [Go] make Schema, Field and simpleRecord implement Stringer
  • ARROW-5554 - [Python] Added a python wrapper for arrow::Concatenate()
  • ARROW-5555 - [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries
  • ARROW-5556 - [Doc][Python] Document JSON reader
  • ARROW-5557 - [C++] Add VisitBits benchmark
  • ARROW-5565 - [Python][Docs] Add instructions how to use gdb to debug C++ libraries when running Python unit tests
  • ARROW-5567 - [C++] Fix build error of memory-benchmark
  • ARROW-5571 - [R] Rework handing of ARROW_R_WITH_PARQUET
  • ARROW-5574 - [R] documentation error for read_arrow()
  • ARROW-5581 - [Java] Provide interfaces and initial implementations for vector sorting
  • ARROW-5582 - [Go] implement RecordEqual
  • ARROW-5586 - [R] convert Array of LIST type to R lists
  • ARROW-5587 - [Java] Add more style check rule for Java code
  • ARROW-5590 - [R] Run "no libarrow" R build in the same CI entry if possible
  • ARROW-5591 - [Go] implement read/write IPC for Duration & Intervals
  • ARROW-5597 - [Packaging] Add Flight deb packages
  • ARROW-5600 - [R] R package namespace cleanup
  • ARROW-5602 - [Java][Gandiva] Add tests for round/cast
  • ARROW-5604 - [Go] improve coverage of TypeTraits
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5612 - [Python][Doc] Add prominent note that date_as_object option changed with Arrow 0.13
  • ARROW-5621 - [Go] implement read/write IPC for Decimal128 arrays
  • ARROW-5622 - [C++][Dataset] Support pkg-config for Arrow Datasets
  • ARROW-5625 - [R] convert Array of struct type to data frame columns
  • ARROW-5632 - [Doc] Basic instructions for using Xcode with Arrow
  • ARROW-5633 - [Python] Enable bz2 in Linux wheels
  • ARROW-5635 - [C++] Added a Compact() method to Table.
  • ARROW-5637 - [Java][C++][Gandiva] Complete In Expression Support
  • ARROW-5639 - [Java] Remove floating point computation from getOffsetBufferValueCapacity
  • ARROW-5641 - [GLib] Remove enums files generated by GNU Autotools from Git targets
  • ARROW-5643 - [FlightRPC] Add ability to override SSL hostname checking
  • ARROW-5650 - [Python] Update manylinux dependency versions
  • ARROW-5652 - [CI] Fix lint docker image
  • ARROW-5653 - [CI] Fix cpp docker image
  • ARROW-5656 - [Python][Packaging] Fix macOS wheel builds, add Flight support
  • ARROW-5659 - [C++] Add support for finding OpenSSL installed by Homebrew
  • ARROW-5660 - [GLib][CI] Use Xcode 10.2
  • ARROW-5661 - [Gandiva][C++] support hash functions for decimals in gandiva
  • ARROW-5662 - [C++] Add support for BOOST_SOURCE=AUTO|BUNDLED|SYSTEM
  • ARROW-5663 - [Packaging][RPM] Update CentOS packages for 0.14.0
  • ARROW-5664 - [Crossbow] Execute nightly crossbow tests on CircleCI instead of Travis
  • ARROW-5668 - [C++/Python] Include 'not null' in schema fields pretty print
  • ARROW-5669 - [Python][Packaging] Add ARROW_TEST_DATA env variable to Crossbow Linux Wheel build
  • ARROW-5670 - [Crossbow] get_apache_mirror.py fails with TLS error on macOS with Python 3.5
  • ARROW-5671 - [crossbow] mac os python wheels failing
  • ARROW-5672 - [Java] Refactor redundant method modifier
  • ARROW-5683 - [R] Add snappy to Rtools Windows builds
  • ARROW-5684 - [Packaging][deb] Add support for Ubuntu 19.04
  • ARROW-5685 - [Packaging][deb] Add support for Apache Arrow Datasets
  • ARROW-5687 - [C++] Remove remaining uses of ARROW_BOOST_VENDORED
  • ARROW-5690 - [Packaging][Python] Fix macOS wheel building
  • ARROW-5694 - [Python] Support list of Decimals in conversion to pandas
  • ARROW-5695 - [C#][Release] Run sourcelink test in verify-release-candidate.sh
  • ARROW-5696 - [C++][Gandiva] Introduce castVarcharVarchar
  • ARROW-5699 - [C++] Optimize decimal128 parsing
  • ARROW-5701 - [C++][Gandiva] Build expr with specific sv
  • ARROW-5702 - [C++] parquet::arrow::FileReader::GetSchema()
  • ARROW-5704 - [C++] Stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl
  • ARROW-5705 - [Java] Optimize BaseValueVector#computeCombinedBufferSize logic
  • ARROW-5706 - [Java] Remove type conversion in getValidityBufferValueCapacity
  • ARROW-5707 - [Java] Improve the performance and code structure for ArrowRecordBatch
  • ARROW-5710 - [C++] Allow compiling Gandiva with Ninja on Windows
  • ARROW-5715 - [Release] Verify Ubuntu 19.04 APT repository
  • ARROW-5718 - [R] auto splice data frames in record_batch() and table()
  • ARROW-5720 - [C++] Create benchmarks for decimal related classes.
  • ARROW-5721 - [Rust] Move array related code into a separate module
  • ARROW-5724 - [R][CI] AppVeyor build should use ccache
  • ARROW-5725 - [Crossbow] Port conda recipes to azure pipelines
  • ARROW-5726 - [Java] Implement a common interface for int vectors
  • ARROW-5727 - [Python][CI] Install pytest-faulthandler before running tests
  • ARROW-5748 - [Packaging][deb] Add support for Debian GNU/Linux buster
  • ARROW-5749 - [Python] Added python binding for Table::CombineChunks
  • ARROW-5751 - [Python][Packaging] Ensure that c-ares is linked statically in Python wheels
  • ARROW-5752 - [Java] Improve the performance of ArrowBuf#setZero
  • ARROW-5755 - [Rust][Parquet] Derive clone for Type.
  • ARROW-5768 - [Release] Remove needless empty lines at the end of CHANGELOG.md
  • ARROW-5773 - [R] Clean up documentation before release
  • ARROW-5780 - [C++] Add benchmark for Decimal operations
  • ARROW-5782 - [Release] Setup test data for Flight in dev/release/01-perform.sh
  • ARROW-5783 - [Release][C#] Exclude dummy.git from RAT check
  • ARROW-5785 - [Rust] Rust datafusion implementation should not depend on rustyline
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicate known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5818 - [Java][Gandiva] support varlen output vectors
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5826 - [Website] Blog post for 0.14.0 release announcement
  • PARQUET-1243 - [C++] Throw more informative exception when reading a length-0 Parquet file
  • PARQUET-1411 - [C++] Add parameterized logical annotations to Parquet metadata
  • PARQUET-1422 - [C++] Use common Arrow IO interfaces throughout codebase
  • PARQUET-1517 - [C++] Crypto package updates to match the final spec
  • PARQUET-1523 - [C++] Vectorize Comparator interface, remove virtual calls on inner loop. Refactor Statistics to not require PARQUET_EXTERN_TEMPLATE
  • PARQUET-1569 - [C++] Consolidate shared unit testing header files
  • PARQUET-1582 - [C++] Add ToString method to ColumnDescriptor
  • PARQUET-1583 - [C++] Remove superfluous parquet::Vector class
  • PARQUET-1586 - [C++] Add --dump options to parquet-reader tool to dump def/rep levels
  • PARQUET-1603 - [C++] rename parquet::LogicalType to parquet::ConvertedType

Bug Fixes

  • ARROW-61 - [Java] Method can return the value bigger than long MAX_VALUE
  • ARROW-352 - [Format] Interval(DAY_TIME) has no unit
  • ARROW-1837 - [Java][Integration] Fix unsigned round trip integration tests
  • ARROW-2119 - [IntegrationTest] Add test case with a stream having no record batches
  • ARROW-2136 - [Python] Check null counts for non-nullable fields when converting from pandas.DataFrame with supplied schema
  • ARROW-2256 - [C++] Fix libfuzzer builds for clang-7
  • ARROW-2461 - [Python] Build manylinux2010 wheels
  • ARROW-2590 - [Python] Pyspark python_udf serialization error on grouped map (Amazon EMR)
  • ARROW-3344 - [Python] Disable flaky Plasma test
  • ARROW-3399 - [Python] Implementing numpy matrix serialization
  • ARROW-3650 - [Python] warn on converting DataFrame with mixed type column names
  • ARROW-3801 - [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
  • ARROW-4021 - [Ruby] Error building red-arrow on msys2
  • ARROW-4076 - [Python] Validate ParquetDataset schema after filtering
  • ARROW-4139 - [Python][Parquet] Wrap new parquet::LogicalType, cast min/max statistics based on LogicalType
  • ARROW-4301 - [Java] use arrow-jni profile for both gandiva/orc
  • ARROW-4301 - [Java][Gandiva] Update version manually
  • ARROW-4324 - [Python] Triage broken type inference logic in presence of a mix of NumPy dtype-having objects and other scalar values
  • ARROW-4350 - [Python] Fix conversion from Python to Arrow with nested lists and NumPy dtype=object items
  • ARROW-4433 - [R] Segmentation fault when instantiating arrow::table from data frame
  • ARROW-4447 - [C++] Investigate dynamic linking for libthift
  • ARROW-4516 - [Python] Error while creating a ParquetDataset on a path without `_common_dataset` but with an empty `_tempfile`
  • ARROW-4523 - [JS] Add row proxy generation benchmark
  • ARROW-4651 - [Flight] Use URIs instead of host/port pair
  • ARROW-4665 - [C++] With glog activated, DCHECK macros are redefined
  • ARROW-4675 - [Python] Fix pyarrow.deserialize failure when reading payload in Python 3 payload generated in Python 2
  • ARROW-4694 - [CI] Improve detect-changes.py on Travis PRs
  • ARROW-4723 - [Python] Ignore "hidden" files that starts with underscore
  • ARROW-4725 - [C++] Enable dictionary builder tests with MinGW build
  • ARROW-4823 - [C++][Python] Do not close raw file handle in ReadaheadSpooler, check that file handles passed to read_csv are not closed
  • ARROW-4832 - [Python] pandas Index metadata for RangeIndex is incorrect
  • ARROW-4845 - [R] Compiler warnings on Windows MingW64
  • ARROW-4851 - [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off
  • ARROW-4877 - [Plasma] CI failure in test_plasma_list
  • ARROW-4884 - [C++] conda-forge thrift-cpp package not available via pkg-config or cmake
  • ARROW-4885 - [C++/Python] Enable Decimal parsing in CSV
  • ARROW-4886 - [Rust] Cast to list with offset
  • ARROW-4923 - [Java] Add methods to set long value at given index in DecimalVector
  • ARROW-4934 - [Python] Address deprecation notice that will be a bug in Python 3.8
  • ARROW-5019 - [C#] ArrowStreamWriter doesn't work on a non-seekable stream
  • ARROW-5049 - [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark
  • ARROW-5051 - [GLib][Gandiva] Don't return temporary memory
  • ARROW-5055 - [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
  • ARROW-5058 - [Release] Fix typos in vote e-mail template
  • ARROW-5059 - [C++][Gandiva] cbrt_* floating point tests can fail due to exact comparisons
  • ARROW-5065 - [Rust] cast kernel does not support casting from Int64
  • ARROW-5068 - [Gandiva][Packaging] Fix gandiva nightly builds after the CMake refactor
  • ARROW-5090 - Parquet linking fails on MacOS due to @rpath in dylib
  • ARROW-5092 - [C#] Create a dummy .git directory to download the source files from GitHub with Source Link
  • ARROW-5095 - [Flight][C++] Expose server error message in DoGet
  • ARROW-5096 - [Packaging][deb] Add missing plasma-store-server packages
  • ARROW-5097 - [Packaging][CentOS6] Remove needless dependencies
  • ARROW-5098 - [Website] Update how to install .deb by APT
  • ARROW-5100 - [JS] Remove swap while collapsing contiguous buffers
  • ARROW-5117 - [Go] fix panic when nil or empty slices are appended to builders
  • ARROW-5119 - [Go] fix Boolean stringer implementation
  • ARROW-5122 - [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
  • ARROW-5128 - [Packaging][CentOS][Conda] Numpy not found in nightly builds
  • ARROW-5129 - [Rust] Column writer bug: check dictionary encoder when adding a new data page
  • ARROW-5130 - [C++][Python] Limit exporting of std::* symbols
  • ARROW-5132 - [Java] Errors on building gandiva_jni.dll on Windows with Visual Studio 2017
  • ARROW-5138 - [Python] Add documentation about pandas preserve_index option
  • ARROW-5140 - [Bug?][Parquet] Can write a jagged array column of strings to disk, but hit `ArrowNotImplementedError` on read
  • ARROW-5142 - , ARROW-5732, ARROW-5735: [CI] Emergency fixes
  • ARROW-5144 - [Python] ParquetDataset and ParquetPiece not serializable
  • ARROW-5146 - [Dev] Fix project name inference in merge script
  • ARROW-5147 - [C++] Add missing dependencies to Brewfile
  • ARROW-5148 - [Gandiva] Allow linking with RTTI-disabled LLVM builds
  • ARROW-5149 - [Packaging][Wheel] Pin LLVM to version 7 in windows builds
  • ARROW-5152 - [Python] Fix CMake warnings
  • ARROW-5159 - [Rust] Unable to build benches in arrow crate.
  • ARROW-5160 - [C++] Don't evaluate expression twice in ABORT_NOT_OK
  • ARROW-5166 - [Python][Parquet] Statistics for uint64 columns may overflow
  • ARROW-5167 - [C++] Upgrade string-view-light to latest
  • ARROW-5169 - [Python] preserve field nullability of specified schema in Table.from_pandas
  • ARROW-5173 - [Go] handle multiple concatenated record batches
  • ARROW-5174 - [Go] implement Stringer for DataTypes
  • ARROW-5177 - [C++/Python] Check column index when reading Parquet column
  • ARROW-5183 - [CI] Fix AppVeyor failure
  • ARROW-5184 - [Rust] Broken links and other documentation warnings
  • ARROW-5186 - [Plasma] Fix crash caused by improper free on CUDA memory
  • ARROW-5194 - [C++][Plasma] TEST(PlasmaSerialization, GetReply) is failing
  • ARROW-5195 - [C++] Detect null strings in CSV string columns
  • ARROW-5201 - [Python] handle collections.abc deprecation warnings
  • ARROW-5208 - [Python] Add mask argument to pyarrow.infer_type, do not look at masked values when inferring output type in pyarrow.array
  • ARROW-5214 - [C++] Fix thirdparty download script
  • ARROW-5217 - [Rust][DataFusion] Fix failing tests
  • ARROW-5232 - [Java] Avoid runaway doubling of vector size
  • ARROW-5233 - [Go] Migrate to flatbuffers-v1.11.0
  • ARROW-5237 - [Python] populate _pandas_api.version
  • ARROW-5240 - [C++][CI] pin cmake_format
  • ARROW-5242 - [C++] Update vendored HowardHinnant/date to master
  • ARROW-5243 - [Java][Gandiva] Add decimal compare tests
  • ARROW-5245 - [CI][C++] Unpin cmake format (current version is 5.1)
  • ARROW-5246 - [Go] use Go-1.12.x in CI
  • ARROW-5249 - [Java] Add auth capability to Flight async operations (#4238)
  • ARROW-5253 - [C++] Fix snappy external build
  • ARROW-5254 - [Flight][Java] Change Flight doAction to allow multiple responses in Java
  • ARROW-5255 - [Java] Proof-of-concept of Java extension types
  • ARROW-5260 - [Python] Fix crash when deserializating from components in another process
  • ARROW-5274 - [JavaScript] Wrong array type for countBy
  • ARROW-5283 - [C++][Plasma] Erase object id in client when abort object
  • ARROW-5285 - [C++][Plasma] Implement to release GpuProcessHandle
  • ARROW-5293 - [C++] Take kernel on DictionaryArray does not preserve ordered flag
  • ARROW-5294 - [Python][CI] Fix manylinux1 build
  • ARROW-5296 - [Java] Ignore timeout-based Flight tests for now
  • ARROW-5301 - [Python] update parquet docs on multithreading
  • ARROW-5304 - [C++] fix thread-safe on CudaDeviceManager::GetInstance
  • ARROW-5306 - [CI][GLib] Disable GTK-Doc
  • ARROW-5308 - [Go] remove deprecated Feather format
  • ARROW-5314 - [Go] fix bug for String Arrays with offset
  • ARROW-5314 - [Go] Fix bug for FixedSizeBinary with offset
  • ARROW-5318 - [Python] pyarrow hdfs reader overrequests
  • ARROW-5325 - [Archery][Benchmark] Output properly formatted jsonlines from benchmark diff cli command
  • ARROW-5330 - [CI][skip appveyor]
  • ARROW-5332 - [R] Update R package README with richer installation instructions
  • ARROW-5348 - [Java][CI] Add missing gandiva javadoc
  • ARROW-5360 - [Rust] Update rustyline to fix build
  • ARROW-5362 - [C++] Fix compression test memory usage
  • ARROW-5371 - [Release] Add tests for dev/release/00-prepare.sh
  • ARROW-5373 - [Java] Add missing details for Gandiva Java Build
  • ARROW-5376 - [C++] Workaround for gcc 5.4.0 bug
  • ARROW-5383 - [Go] Update flatbuf for new Duration type
  • ARROW-5387 - [Go] properly handle sub-slice of List
  • ARROW-5388 - [Go] use arrow.TypeEquals in array.NewChunked
  • ARROW-5390 - [CI][skip appveyor]
  • ARROW-5397 - [FlightRPC] Add TLS certificates for testing Flight
  • ARROW-5398 - [Python] Fix Flight tests
  • ARROW-5403 - [C++] Use GTest shared libraries with BUNDLED build, always use BUNDLED with MSVC
  • ARROW-5411 - [C++][Python] Build error building on Mac OS Mojave
  • ARROW-5412 - [Integration] Add Java option for netty reflection
  • ARROW-5419 - [C++] Allow recognizing empty strings as null strings in CSV files
  • ARROW-5421 - [Packaging][Crossbow] Duplicated key in nightly test configuration
  • ARROW-5422 - [CI] [C++] Build failure with Google Benchmark
  • ARROW-5430 - [Python] Raise ArrowInvalid for pyints larger than int64
  • ARROW-5435 - [Java] Add test for IntervalYearVector#getAsStringBuilder
  • ARROW-5437 - [Python] Missing pandas pytest marker from parquet tests
  • ARROW-5446 - [C++][CMake] Install arrow/util/config.h into CMAKE_INSTALL_INCLUDEDIR
  • ARROW-5448 - [C++][CI][MinGW][skip travis]
  • ARROW-5453 - [C++] Update to cmake-format=0.5.2 and pin again
  • ARROW-5455 - [Rust] Build broken by 2019-05-30 Rust nightly
  • ARROW-5456 - [GLib][Plasma] Fix dependency order on building document
  • ARROW-5457 - [GLib][Plasma] Fix environment variable name for test
  • ARROW-5459 - [Go] implement Stringer for float16 DataType
  • ARROW-5462 - [Go] support writing zero-length List arrays
  • ARROW-5479 - [Rust][DataFusion] Use ARROW_TEST_DATA instead of relative path for testing
  • ARROW-5487 - [Docs] Fix Sphinx failure
  • ARROW-5493 - [Go][Integration] add Go support for IPC integration tests
  • ARROW-5507 - [Plasma][CUDA] Fix compile error
  • ARROW-5514 - [C++] Fix pretty-printing uint64 values
  • ARROW-5517 - [C++] Only check header basename for 'internal' when collecting public headers
  • ARROW-5520 - [Packaging][deb] Add support for building on arm64
  • ARROW-5521 - [Packaging] Use Apache RAT 0.13
  • ARROW-5528 - [C++] Fixed a bug when Concatenate() arrays with no value buffers.
  • ARROW-5532 - [JS] Field Metadata Not Read
  • ARROW-5551 - [Go] implement FixedSizeArrays with 2-buffers layout
  • ARROW-5553 - [Ruby] Use the official packages to install Apache Arrow
  • ARROW-5576 - [C++] Query ASF mirror system for URL and use when downloading Thrift
  • ARROW-5577 - [C++][Alpine] Correct googletest shared library paths on non-Windows to fix Alpine build
  • ARROW-5583 - [Java] When the isSet of a NullableValueHolder is 0, the buffer field should not be used
  • ARROW-5584 - [Java] Add import for link reference in FieldReader javadoc
  • ARROW-5589 - [C++] Add missing nullptr check during flatbuffer decoding
  • ARROW-5592 - [Go] implement Duration array
  • ARROW-5596 - [Python] Fix Python-3 syntax only in test_flight.py
  • ARROW-5601 - [C++][Gandiva] fail if the output type is not supported
  • ARROW-5603 - [Python] Register custom pytest markers to avoid warnings
  • ARROW-5605 - [C++] Verify Flatbuffer messages in more places to prevent crashes due to bad inputs
  • ARROW-5606 - [Python] deal with deprecated RangeIndex._start/_stop/_step
  • ARROW-5608 - [C++][parquet] Fix invalid memory access when using parquet::arrow::ColumnReader
  • ARROW-5615 - [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R literal
  • ARROW-5616 - [C++][Python] Fix -Wwrite-strings warning when building against Python 2.7 headers
  • ARROW-5617 - [C++] thrift_ep 0.12.0 fails to build when using ARROW_BOOST_VENDORED=ON
  • ARROW-5619 - [C++] Make get_apache_mirror.py workable with Python 3.5
  • ARROW-5623 - [GLib][CI] Use system Meson on macOS
  • ARROW-5624 - [C++] Fix typo causing build failure when -Duriparser_SOURCE=BUNDLED
  • ARROW-5626 - [C++] Fix caching of expressions with decimals
  • ARROW-5629 - [C++] Fix Coverity issues
  • ARROW-5631 - [C++] Fix FindBoost targets with cmake3.2
  • ARROW-5644 - [Python] test_flight.py::test_tls_do_get appears to hang
  • ARROW-5647 - [Python] Accessing a file from Databricks using pandas read_parquet using the pyarrow engine fails with : Passed non-file path: /mnt/aa/example.parquet
  • ARROW-5648 - [C++] Avoid using codecvt
  • ARROW-5654 - [C++][Python] Add ChunkedArray::Validate method that checks chunk types for consistency, invoke in Python
  • ARROW-5657 - [C++] "docker-compose run cpp" broken in master
  • ARROW-5674 - [Python] Missing pandas pytest markers from test_parquet.py
  • ARROW-5675 - [Doc] Fix typo in Xcode workflow documentation
  • ARROW-5678 - [R][Lint] Fix hadolint docker linting error
  • ARROW-5693 - [Go] skip IPC integration tests for Decimal128
  • ARROW-5697 - [GLib] Use system pkg-config in c_glib/Dockerfile to correctly find system libraries such as libglib
  • ARROW-5698 - [R] Fix docker-compose build
  • ARROW-5709 - [C++] Fix gandiva-date_time_test failure on Windows
  • ARROW-5714 - [JS] Inconsistent behavior in Int64Builder with/without BigNum
  • ARROW-5723 - [C++][Arrow] Fix crossbow failure
  • ARROW-5728 - [Python] Pin jpype1 version to 0.6.3 due to CI breakage from 0.7.0
  • ARROW-5729 - [Python][Java] ArrowType.Int object has no attribute 'isSigned'
  • ARROW-5730 - [Python][CI] Selectively skip test cases in the dask integration test
  • ARROW-5732 - [C++] macOS builds failing idiosyncratically on master with warnings from pmmintrin.h
  • ARROW-5735 - [C++] Appveyor builds failing persistently in thrift_ep build
  • ARROW-5737 - [Crossbow] Use Python version version 2.7 in the gandiva tasks
  • ARROW-5738 - [Crossbow][Conda] OSX package builds are failing with missing intrinsics
  • ARROW-5739 - [CI] Fix python docker image
  • ARROW-5750 - [Java] Fix java compilation errors
  • ARROW-5754 - [C++] Add override mark for ~GrpcStreamWriter
  • ARROW-5765 - [C++] Fix TestDictionary.Validate in release mode, add docker-compose job for testing C++ release build
  • ARROW-5769 - [Release] Ensure setting up test data in dev/release/00-prepare.sh
  • ARROW-5770 - [C++] Fix -Wpessimizing-move in result.h
  • ARROW-5771 - [Python] Add pytz to conda_env_python.yml to fix python-nopandas build
  • ARROW-5774 - [Java][Documentation] Document the need to checkout git submodules for flight
  • ARROW-5781 - [Archery] Ensure benchmark clone accepts remote in revision
  • ARROW-5791 - [Python] pyarrow.csv.read_csv hangs + eats all RAM
  • ARROW-5816 - [Release] Parallel curl does not work reliably in verify-release-candidate-sh
  • ARROW-5922 - [Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API
  • PARQUET-1402 - [C++] Parquet files with dictionary page offset as 0 is not readable
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1565 - [C++] Add default case to catch all unhandled physical types
  • PARQUET-1571 - [C++] Fix BufferedInputStream when buffer exactly exhausted
  • PARQUET-1574 - [C++] fix parquet-encoding-test
  • PARQUET-1581 - [C++] Fix undefined behavior in encoding.cc
kou
published 0.13.0 •

Changelog

Source

Apache Arrow 0.13.0 (2019-04-01)

Bug Fixes

  • ARROW-295 - [Documentation] Add DOAP file
  • ARROW-1171 - [C++] Segmentation faults on Fedora 24 with pyarrow-manylinux1 and self-compiled turbodbc
  • ARROW-2392 - [C++] Check schema compatibility when writing a RecordBatch
  • ARROW-2399 - [Rust] Builder<T> should not provide a set() method
  • ARROW-2598 - [Python] table.to_pandas segfault
  • ARROW-3086 - [GLib] GISCAN fails due to conda-shipped openblas
  • ARROW-3096 - [Python] Update Python source build instructions given Anaconda/conda-forge toolchain migration
  • ARROW-3133 - [C++] Remove allocation from Binary Boolean Kernels.
  • ARROW-3133 - [C++] Remove allocations from InvertKernel
  • ARROW-3208 - [C++] Fix Cast dictionary to numeric segfault
  • ARROW-3426 - [CI] Java integration test very verbose
  • ARROW-3564 - [C++] Fix dictionary encoding logic for Parquet 2.0
  • ARROW-3578 - [Release] Resolve all hard and symbolic links in tar.gz
  • ARROW-3593 - [R] CI builds failing due to GitHub API rate limits
  • ARROW-3606 - [Crossbow] Fix flake8 crossbow warnings
  • ARROW-3669 - [Python] Raise error on Numpy byte-swapped array
  • ARROW-3843 - [C++][Python] Allow a "degenerate" Parquet file with no columns
  • ARROW-3923 - [Java] JDBC Time Fetches Without Timezone
  • ARROW-4007 - [Java][Plasma] Plasma JNI tests failing
  • ARROW-4050 - [Python][Parquet] core dump on reading parquet file
  • ARROW-4081 - [Go] Sum methods panic when the array is empty
  • ARROW-4104 - [Java] race in AllocationManager during release
  • ARROW-4108 - [Python/Java] Spark integration tests do not work
  • ARROW-4117 - [Python] "asv dev" command fails with latest revision
  • ARROW-4140 - [C++][Gandiva] Compiled LLVM bitcode file path may result in libraries being non-relocatable
  • ARROW-4145 - [C++] Find Windows-compatible strptime implementation
  • ARROW-4181 - [Python] Fixes for Numpy struct array conversion
  • ARROW-4192 - [CI] Fix broken dev/run_docker_compose.sh script
  • ARROW-4213 - [Flight] Fix incompatibilities between C++ and Java
  • ARROW-4244 - [Format] Clarify padding/alignment rationale/recommendation.
  • ARROW-4250 - [C++] adding explicit epsilon for ApproxEquals and corresponding assert macro
  • ARROW-4252 - [C++] Fix missing Status code and newline
  • ARROW-4253 - [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
  • ARROW-4254 - [C++][Gandiva] Build with Boost from Ubuntu Trusty apt
  • ARROW-4255 - [C++] Eagerly initialize name_to_index_ to avoid race
  • ARROW-4261 - [C++] Make CMake paths for IPC, Flight, Thrift, and Plasma subproject compatible
  • ARROW-4264 - [C++] Clarify use of DCHECKs in Kernels
  • ARROW-4267 - [C++/Parquet] Handle duplicate and struct columns in RowGroup reads
  • ARROW-4274 - [C++][Gandiva] split decimal into two parts
  • ARROW-4275 - [C++][Gandiva] Fix slow decimal test
  • ARROW-4280 - Update README.md to reflect parquet deps
  • ARROW-4282 - [Rust] builder benchmark is broken
  • ARROW-4284 - [C#] File / Stream serialization fails due to type mismatch / missing footer
  • ARROW-4295 - [C++][Plasma] Fix incorrect log message
  • ARROW-4296 - [Plasma] Use one mmap file by default, prevent crash with -f
  • ARROW-4308 - [Python] pyarrow has a hard dependency on pandas
  • ARROW-4311 - [Python] Regression on pq.ParquetWriter incorrectly handling source string
  • ARROW-4312 - [C++] Only run 2 * os.cpu_count() clang-format instances at once
  • ARROW-4319 - [C++][Plasma] plasma/store.h pulls in flatbuffer dependency
  • ARROW-4320 - [C++] Add tests for non-contiguous tensors
  • ARROW-4322 - [C++] Don't use _GLIBCXX_USE_CXX11_ABI=0 anymore in docker scripts
  • ARROW-4323 - [Packaging] Fix failing OSX clang conda forge builds
  • ARROW-4326 - [C++] Development instructions in python/development.rst will not work for many Linux distros with new conda-forge toolchain
  • ARROW-4327 - [Python] Add requirements-build.txt convenience file
  • ARROW-4328 - Add a ARROW_USE_OLD_CXXABI configure var to R
  • ARROW-4329 - Python should include the parquet headers
  • ARROW-4342 - [Gandiva][Java] Ignore flaky test.
  • ARROW-4347 - [CI][Python] Also run Python builds when Java affected.
  • ARROW-4349 - [C++] Add static linking option for benchmarks, fix Windows benchmark build failures
  • ARROW-4351 - [C++] Fix CMake errors when neither building shared libraries nor tests
  • ARROW-4355 - [C++] Reorder testing code into src/arrow/testing
  • ARROW-4360 - [C++] Query homebrew for Thrift
  • ARROW-4364 - [C++] Fix CHECKIN warnings
  • ARROW-4366 - [Docs] Change extension from format/README.md to format/README.rst
  • ARROW-4367 - [C++] StringDictionaryBuilder segfaults on Finish with only null entries
  • ARROW-4368 - [Docs] Fix install document for Ubuntu 16.04 or earlier
  • ARROW-4370 - [Python][Bool] to pandas
  • ARROW-4374 - [C++] DictionaryBuilder does not correctly report length and null_count
  • ARROW-4381 - [CI] Update linter container build instructions
  • ARROW-4382 - [C++] Improve new cpplint output readability
  • ARROW-4384 - [C++] Running "format" target on new Windows 10 install opens "how do you want to open this file" dialog
  • ARROW-4385 - [Packaging] Fix PyArrow version update pattern on release
  • ARROW-4389 - [R] Don't install clang-tools in test job
  • ARROW-4395 - [JS] Fix ts-node error running bin/arrow2csv
  • ARROW-4400 - [CI] Switch to https repo for llvm
  • ARROW-4403 - [Rust] Fix format errors
  • ARROW-4404 - [CI] AppVeyor toolchain build does not build anything
  • ARROW-4407 - [C++] Cache compiler for CMake external projects
  • ARROW-4410 - [C++] Fix edge cases in InvertKernel
  • ARROW-4413 - [Python] Fix pa.hdfs.connect() on Python 2
  • ARROW-4414 - [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros
  • ARROW-4417 - [C++] Fix doxygen build
  • ARROW-4420 - [INTEGRATION] Make spark integration test pass and test against spark's master branch
  • ARROW-4421 - [C++][Flight] Handle large RPC messages in Flight
  • ARROW-4434 - [Python] Allow creating trivial StructArray
  • ARROW-4440 - [C++] Revert recent changes to flatbuffers EP causing flakiness
  • ARROW-4457 - [Python] Allow creating Decimal array from Python ints
  • ARROW-4469 - [CI] Pin conda-forge binutils version to 2.31 for now
  • ARROW-4471 - [C++] Pass AR and RANLIB to all external projects
  • ARROW-4474 - Use signed integers in FlightInfo payload size fields
  • ARROW-4480 - [Python] Drive letter removed when writing parquet file
  • ARROW-4487 - [C++] Appveyor toolchain build does not actually build the project
  • ARROW-4494 - [Java] arrow-jdbc JAR is not uploaded on release
  • ARROW-4496 - [Python] Pin to gfortran<4
  • ARROW-4498 - [Plasma] Fix building Plasma with CUDA enabled
  • ARROW-4500 - [C++] Remove pthread / librt hacks causing linking issues in some Linux environments
  • ARROW-4501 - Fix out-of-bounds read in DoubleCrcHash
  • ARROW-4525 - [Rust][Parquet] Enable conversion of ArrowError to ParquetError
  • ARROW-4527 - [Packaging][Linux] Use LLVM 7
  • ARROW-4532 - [Java] fix bug causing very large varchar value buffers
  • ARROW-4533 - [Python] Document how to run hypothesis tests
  • ARROW-4535 - [C++] Fix MakeBuilder to preserve ListType's field name
  • ARROW-4536 - [GLib] Add data_type argument in garrow_list_array_new
  • ARROW-4538 - [Python] Remove index column from subschema in write_to_dataframe
  • ARROW-4549 - [C++] Can't build benchmark code on CUDA enabled build
  • ARROW-4550 - [JS] Fix AMD pattern
  • ARROW-4559 - [Python] Allow Parquet files with special characters in their names
  • ARROW-4563 - [Python] Validate decimal128() precision input
  • ARROW-4571 - [Format] Tensor.fbs file has multiple root_type declarations
  • ARROW-4573 - [Python] Add Flight unit tests
  • ARROW-4576 - [Python] Fix error during benchmarks
  • ARROW-4577 - [C++] Don't set interface link libs on arrow_shared where there are none
  • ARROW-4581 - [C++] Do not require googletest_ep or gbenchmark_ep for library targets
  • ARROW-4582 - [Python/C++] Acquire the GIL on Py_INCREF
  • ARROW-4584 - [Python] Add built wheel to manylinux1 dockerignore
  • ARROW-4585 - [C++] Add protoc dependency to flight_testing
  • ARROW-4587 - [C++] Fix segfaults around DoPut implementation
  • ARROW-4597 - [C++] Targets for system Google Mock shared library are missing
  • ARROW-4601 - [Python] Add license header to dockerignore
  • ARROW-4606 - [Rust] [DataFusion] FilterRelation created RecordBatch with empty schema
  • ARROW-4608 - [C++] cmake script assumes that double-conversion installs static libs
  • ARROW-4617 - [C++] Support double-conversion<3.1
  • ARROW-4624 - [C++] Fix building benchmarks
  • ARROW-4629 - [Python] Pandas arrow conversion slowed down by imports
  • ARROW-4635 - [Java] allocateNew to use last capacity
  • ARROW-4639 - [CI] Switch off GFLAGS_SHARED for osx
  • ARROW-4641 - [C++][Flight] Suppress strict aliasing warnings from "unsafe" casts in client.cc
  • ARROW-4642 - [R] change f to file in read_parquet_file()
  • ARROW-4653 - [C++] Fix bug in decimal multiply
  • ARROW-4654 - [C++] Explicit flight.cc source dependencies
  • ARROW-4657 - Don't build benchmarks in release verify script
  • ARROW-4658 - [C++] Shared gflags is also a run-time conda requirement
  • ARROW-4659 - [CI] ubuntu/debian nightlies fail because of missing gandiva files
  • ARROW-4660 - [C++] Use set_target_properties for defining GFLAGS_IS_A_DLL
  • ARROW-4664 - [C++] Do not execute expressions inside DCHECK macros in release builds
  • ARROW-4669 - [Java] Add validity checks to slice
  • ARROW-4672 - [CI] Fix clang-7 build entry
  • ARROW-4680 - [CI][Rust] Travis CI builds fail with latest Rust 1.34.0…
  • ARROW-4684 - [Python] CI failures in test_cython.py
  • ARROW-4687 - [Python] Stop Flight server on incoming signals
  • ARROW-4688 - [C++][Parquet] Chunk binary column reads at 2^31 - 1 byte boundaries to avoid splitting chunk inside nested string cell
  • ARROW-4696 - Better CUDA detection in release verification script
  • ARROW-4699 - [C++] remove json chunker's requirement of null terminated buffers
  • ARROW-4704 - [GLib][CI] Ensure killing plasma_store_server
  • ARROW-4710 - [C++][R] New linting script skip files with "cpp" extension
  • ARROW-4712 - [C++][CI] fix build (sum.cc) has warnings in clang
  • ARROW-4721 - [Rust][DataFusion] Propagate schema in filter
  • ARROW-4724 - [C++][CI] Enable Python build and test in MinGW build
  • ARROW-4728 - [JS] Fix Table#assign when passed zero-length RecordBatches
  • ARROW-4737 - run C# tests in CI
  • ARROW-4744 - [C++][CI] Change mingw builds back to debug. Cleanup up some version warnings
  • ARROW-4750 - [C++] RapidJSON triggers Wclass-memaccess on GCC 8+
  • ARROW-4760 - [C++] protobuf 3.7 defines EXPECT_OK that clashes with Arrow's macro
  • ARROW-4766 - [C++] Fix empty array cast segfault
  • ARROW-4767 - [C#] ArrowStreamReader crashes while reading the end of a stream
  • ARROW-4768 - [C++][CI] Don't run flaky tests in MinGW build
  • ARROW-4774 - [C++] Fix FileWriter::WriteTable segfault
  • ARROW-4775 - [Site] Site navbar cannot be expanded
  • ARROW-4783 - [C++][CI] Disable arrow thread-pool test on mingw to avoid appveyor timeouts
  • ARROW-4793 - [Ruby] Suppress unused variable warning
  • ARROW-4796 - [Flight/Python] Keep underlying Python object alive in FlightServerBase.do_get
  • ARROW-4802 - [Python] Follow symlinks when deriving Hadoop classpath for HDFS
  • ARROW-4807 - [Rust] Fix csv_writer benchmark
  • ARROW-4811 - [C++] Fix misbehaving CMake dependency on flight_grpc_gen
  • ARROW-4813 - [Ruby] Add tests for == and !=
  • ARROW-4820 - [Python] hadoop class path derived not correct
  • ARROW-4822 - [C++/Python] Check for None on calls to equals
  • ARROW-4828 - [Python] manylinux1 docker-compose context should be python/manylinux1
  • ARROW-4850 - [CI] Ensure integration_test.py returns non-zero on failures
  • ARROW-4853 - [Rust] Array slice doesn't work on ListArray and StructArray
  • ARROW-4857 - [C++/Python/CI] docker-compose in manylinux1 crossbow jobs too old
  • ARROW-4866 - [C++] Fix zstd_ep build for Debug, static CRT builds. Add separate CMake variable for propagating compiler toolchain to ExternalProjects
  • ARROW-4867 - [Python] Respect ordering of columns argument passed to Table.from_pandas
  • ARROW-4869 - [C++] Fix gmock usage in compute/kernels/util-internal-test.cc
  • ARROW-4870 - [Ruby] Fix mys2_mingw_dependencies
  • ARROW-4871 - [Java/Flight] Handle large Flight messages
  • ARROW-4872 - [Python] Keep backward compatibility for ParquetDatasetPiece
  • ARROW-4879 - [C++] cmake can't use conda's flatbuffers
  • ARROW-4881 - [C++] remove references to ARROW_BUILD_TOOLCHAIN
  • ARROW-4900 - [C++] polyfill __cpuidex on mingw-w64
  • ARROW-4903 - [C++] Fix static/shared-only builds
  • ARROW-4906 - [Format] Write about SparseMatrixIndexCSR format is sorted
  • ARROW-4918 - [C++] Add cmake-format to pre-commit
  • ARROW-4928 - [Python] Fix Hypothesis test failures
  • ARROW-4931 - [C++] CMake fails on gRPC ExternalProject
  • ARROW-4938 - [Glib] Undefined symbols error occurred when GIR file is being generated.
  • ARROW-4942 - [Ruby] Remove needless omits in tests
  • ARROW-4948 - [JS] Nightly test failure
  • ARROW-4950 - [C++] Fix CMake 3.2 build
  • ARROW-4952 - [C++] Floating-point comparisons should consider NaNs unequal
  • ARROW-4953 - [Ruby] Not loading libarrow-glib
  • ARROW-4954 - [Python] Fix test failure with Flight enabled
  • ARROW-4958 - [C++] Parquet benchmarks depend on its static test libs
  • ARROW-4961 - [C++] Add documentation note that GTest_SOURCE=BUNDLED is current required on Windows
  • ARROW-4962 - [C++] Warning level to CHECKIN can't compile on modern GCC
  • ARROW-4976 - [JS] Invalidate RecordBatchReader node/dom streams on reset()
  • ARROW-4982 - [GLib][CI] Run tests on AppVeyor
  • ARROW-4984 - Check if Flight gRPC server starts properly
  • ARROW-4986 - [CI] Travis fails to install llvm@7
  • ARROW-4989 - [C++] Find re2 on Ubuntu if asked to
  • ARROW-4991 - [CI] Bump travis node version to 11.12
  • ARROW-4997 - [C#] ArrowStreamReader doesn't consume whole stream and doesn't implement sync read.
  • ARROW-5009 - [C++] Remove using std::.* where I could find them
  • ARROW-5010 - [Release] Fix source release docker
  • ARROW-5012 - [C++] Install testing headers
  • ARROW-5023 - [Release] Fix default value syntax in 02-source.sh
  • ARROW-5024 - [Release] Fix missing variable with --arrow-version
  • ARROW-5025 - [Python][Packaging] Fix gandiva.dll detection
  • ARROW-5026 - [Python][Packaging] Fix gandiva.dll detection on non Windows
  • ARROW-5029 - [C++] Fix compilation warnings in release mode
  • ARROW-5031 - [Dev] Run CUDA Python tests in release verification script
  • ARROW-5042 - [Release] Use the correct dependency source in verification script
  • ARROW-5043 - [Release][Ruby] Fix dependency error in verification script
  • ARROW-5044 - [Release][Rust] Use stable toolchain for format check in verification script
  • ARROW-5046 - [Release][C++] Exclude fragile Plasma test from verification script
  • ARROW-5047 - [Release] Always set up parquet-testing in verification script
  • ARROW-5048 - [Release][Rust] Set up arrow-testing in verification script
  • ARROW-5050 - [C++] cares_ep should build before grpc_ep
  • ARROW-5087 - [Debian] APT repository no longer contains libarrow-dev
  • ARROW-5658 - [JAVA] Provide ability to resync VectorSchemaRoot if types change
  • PARQUET-1482 - [C++] Add branch to TypedRecordReader::ReadNewPage for …
  • PARQUET-1494 - [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards
  • PARQUET-1532 - [C++] Fix build error with MinGW

New Features and Improvements

  • ARROW-47 - [C++] Preliminary arrow::Scalar object model
  • ARROW-331 - [Doc] Add statement about Python 2.7 compatibility
  • ARROW-549 - [C++] Add arrow::Concatenate function to combine multiple arrays into a single Array
  • ARROW-572 - [C++] Apply visitor pattern in IPC metadata
  • ARROW-585 - [C++] Experimental public API for user-defined extension types and arrays
  • ARROW-694 - [C++] Initial parser interface for reading JSON into RecordBatches
  • ARROW-1425 - [Python][Documentation] Examples of convert Timestamps to/from pandas via arrow
  • ARROW-1572 - [C++] Implement "value counts" kernels for tabulating value frequencies
  • ARROW-1639 - [Python] Serialize RangeIndex as metadata via Table.from_pandas instead of converting to a column of integers
  • ARROW-1642 - [GLib] Build GLib using Meson in Appveyor
  • ARROW-1807 - [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers
  • ARROW-1896 - [C++] Do not allocate memory inside CastKernel. Clean up template instantiation to not generate dead identity cast code
  • ARROW-2015 - [Java] Replace Joda time with Java 8 time
  • ARROW-2022 - [Format] Add metadata to message
  • ARROW-2112 - [C++] Enable cpplint to be run on Windows
  • ARROW-2243 - [C++] Enable IPO/LTO
  • ARROW-2409 - [Rust] Deny warnings in CI.
  • ARROW-2460 - [Rust] Schema and DataType::Struct should use Vec<Rc<Field>>
  • ARROW-2487 - [C++] Provide a variant of AppendValues that takes bytemaps for the nullability
  • ARROW-2523 - [Rust] Implement CAST operations for arrays
  • ARROW-2620 - [Rust] Integrate memory pool abstraction with rest of codebase
  • ARROW-2627 - [Python] Add option to pass memory_map argument to ParquetDataset
  • ARROW-2904 - [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc
  • ARROW-3066 - [Wiki] Add "How to contribute" to developer wiki
  • ARROW-3084 - [Python] Do we need to build both unicode variants of pyarrow wheels?
  • ARROW-3107 - [C++] arrow::PrettyPrint for Column instances
  • ARROW-3121 - [C++] Mean aggregate kernel
  • ARROW-3123 - [C++] Implement Count aggregate kernel
  • ARROW-3135 - [C++] Add helper functions for validity bitmap propagation in kernel context
  • ARROW-3149 - [C++] Use gRPC (when it exists) from conda-forge for CI builds
  • ARROW-3162 - [Python][Flight] Enable implementing Flight servers in Python
  • ARROW-3162 - Flight Python bindings
  • ARROW-3239 - [C++] Implement simple random array generation
  • ARROW-3255 - [C++/Python] Migrate Travis CI jobs off Xcode 6.4
  • ARROW-3289 - [C++] Implement Flight DoPut
  • ARROW-3292 - [C++] Test Flight RPC in Travis CI
  • ARROW-3295 - [Packaging] Package gRPC libraries in conda-forge for use in builds, packaging
  • ARROW-3297 - [Python] Python bindings for Flight C++ client
  • ARROW-3311 - [R] Functions for deserializing IPC components from arrow::Buffer or from IO interface
  • ARROW-3328 - [Flight] Allow for optional unique flight identifier to be sent with FlightGetInfo
  • ARROW-3361 - [R] Also run cpplint on Rcpp source files
  • ARROW-3364 - [Docs] Add docker-compose integration documentation
  • ARROW-3367 - [INTEGRATION] Port Spark integration test to the docker-compose setup
  • ARROW-3422 - [C++] Uniformly add ExternalProject builds to the "toolchain" target. Fix gRPC EP build on Linux
  • ARROW-3434 - [Packaging] Add Apache ORC C++ library to conda-forge
  • ARROW-3435 - [C++] Add option to use dynamic linking with re2
  • ARROW-3511 - [Gandiva] Link filter and project operations
  • ARROW-3532 - [Python] Emit warning when looking up for duplicate struct or schema fields
  • ARROW-3550 - [C++] use kUnknownNullCount for the default null_count argument
  • ARROW-3554 - [C++] Reverse traits for C++
  • ARROW-3594 - [Packaging] Build "cares" library in conda-forge
  • ARROW-3595 - [Packaging] Build boringssl in conda-forge
  • ARROW-3596 - [Packaging] Build gRPC in conda-forge
  • ARROW-3619 - [R] Expose global thread pool optins
  • ARROW-3631 - [C#] Add Appveyor configuration
  • ARROW-3653 - [C++][Python] Support data copying between different GPU devices
  • ARROW-3735 - [Python] Add test for calling cast() with None
  • ARROW-3761 - [R] Bindings for CompressedInputStream, CompressedOutputStream
  • ARROW-3763 - [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
  • ARROW-3769 - [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
  • ARROW-3770 - [C++] Validate schema for each table written with parquet::arrow::FileWriter
  • ARROW-3816 - [R] nrow.RecordBatch method
  • ARROW-3824 - [R] Add basic build and test documentation
  • ARROW-3838 - [Rust] CSV Writer
  • ARROW-3846 - [Gandiva][C++] Build Gandiva C++ libraries and get unit tests passing on Windows
  • ARROW-3882 - [Rust] Cast Kernel for most types
  • ARROW-3903 - [Python] Random array generator for Arrow conversion and Parquet testing
  • ARROW-3926 - [Python] Add Gandiva bindings to Python manylinux1 wheels
  • ARROW-3951 - [Go] implement a CSV writer
  • ARROW-3954 - [Rust] Add Slice to Array and ArrayData
  • ARROW-3965 - [Java] JDBC-To-Arrow Configuration
  • ARROW-3966 - [Java] JDBC Column Metadata in Arrow Field Metadata
  • ARROW-3972 - [C++] Migrate to LLVM 7. Add option to disable using ld.gold
  • ARROW-3981 - [C++] Rename json.h
  • ARROW-3985 - [C++] Let ccache preserve comments
  • ARROW-4012 - [Website] Add documentation how to install Apache Arrow on MSYS2
  • ARROW-4014 - [C++] Fix "LIBCMT" warnings on MSVC
  • ARROW-4023 - [Gandiva] Address long CI times in macOS builds
  • ARROW-4024 - [Python] Raise minimal Cython version to 0.29
  • ARROW-4031 - [C++] Refactor bitmap building
  • ARROW-4040 - [Rust] Add array_ops method for filtering an array
  • ARROW-4056 - [C++] Unpin boost-cpp in conda_env_cpp.yml
  • ARROW-4061 - [Rust][Parquet] Implement spaced version for non-diction…
  • ARROW-4068 - [Gandiva] Support building with Xcode 6.4
  • ARROW-4071 - [Rust] Add rustfmt as a pre-commit hook
  • ARROW-4072 - [Rust] Set default value for PARQUET_TEST_DATA
  • ARROW-4092 - [Rust] Implement common Reader / DataSource trait for CSV and Parquet
  • ARROW-4094 - [Python] Store RangeIndex in Parquet files as metadata rather than a physical data column
  • ARROW-4110 - [C++] Do not generate distinct cast kernels when input and output type are the same
  • ARROW-4123 - [C++] Enable linting tools to be run on Windows
  • ARROW-4124 - [C++] Draft Aggregate and Sum kernels
  • ARROW-4142 - [Java] JDBC Array -> Arrow ListVector
  • ARROW-4165 - [C++] Port cpp/apidoc/Windows.md and other files to Sphinx / rst
  • ARROW-4180 - [Java] Make CI tests use logback.xml
  • ARROW-4196 - [Rust] Add explicit SIMD vectorization for arithmetic ops in "array_ops"
  • ARROW-4198 - [Gandiva] Added support to cast timestamp
  • ARROW-4204 - [Gandiva] add support for decimal subtract
  • ARROW-4205 - [Gandiva] Support for decimal multiply
  • ARROW-4206 - [Gandiva] support decimal divide and mod
  • ARROW-4212 - [C++][Python] CudaBuffer view of arbitrary device memory object
  • ARROW-4230 - [C++] Fix Flight builds with gRPC/Protobuf/c-ares
  • ARROW-4232 - [C++] Follow conda-forge compiler ABI migration
  • ARROW-4234 - [C++] Improve memory bandwidth test
  • ARROW-4235 - [GLib] Use "column_builder" in GArrowRecordBatchBuilder
  • ARROW-4236 - [java] Distinct plasma client create exceptions
  • ARROW-4245 - [Rust] Add Rustdoc header to source files
  • ARROW-4247 - [Packaging] Update verify script for 0.12.0
  • ARROW-4251 - [C++][Release] Add option to set ARROW_BOOST_VENDORED environment variable in verify-release-candidate.sh
  • ARROW-4262 - [Website] Preview to Spark with Arrow and R improvements
  • ARROW-4263 - [Rust] Donate DataFusion
  • ARROW-4265 - [C++] Automatic conversion between Table and std::vector<std::tuple<..>>
  • ARROW-4268 - [C++] Native C type TypeTraits
  • ARROW-4271 - [Rust] Move Parquet specific info to Parquet Readme
  • ARROW-4273 - [Release] Fix verification script to use cf201901 conda-forge label
  • ARROW-4277 - [C++] Add gmock to the toolchain
  • ARROW-4281 - [CI] Use Ubuntu Xenial VMs on Travis-CI
  • ARROW-4285 - [Python] Use proper builder interface for serialization
  • ARROW-4287 - [C++] Ensure minimal bison version on OSX for Thrift
  • ARROW-4289 - [C++] Forward AR and RANLIB to thirdparty builds
  • ARROW-4290 - [C++/Gandiva] Support detecting correct LLVM version in Homebrew
  • ARROW-4291 - [Dev] Support selecting features in release verification scripts
  • ARROW-4294 - [C++][Plasma] Add support for evicting Plasma objects to external store
  • ARROW-4297 - [C++] Fix build error with MinGW-w64 32-bit
  • ARROW-4298 - [Java] Add javax.annotation-api dependency for JDK >= 9
  • ARROW-4299 - [Ruby] Depend on the same version as Red Arrow
  • ARROW-4300 - [C++] Restore apache-arrow Homebrew recipe and define process for maintaining and updating for releases
  • ARROW-4303 - [Gandiva/Python] Build LLVM with RTTI in manylinux1 container
  • ARROW-4305 - [Rust] Fix parquet version number in README
  • ARROW-4307 - [C++] Fix Doxygen warnings
  • ARROW-4310 - [Website] Update install document for 0.12.0
  • ARROW-4313 - Define general benchmark database schema
  • ARROW-4315 - [Website] Add Go and Rust to list of supported languages
  • ARROW-4318 - [C++] Add Tensor::CountNonZero
  • ARROW-4321 - [CI] Setup conda-forge channel globally in docker containers
  • ARROW-4330 - [C++] More robust discovery of pthreads
  • ARROW-4331 - [C++] Extend Scalar Datum to support more types
  • ARROW-4332 - [Website] Improve documentation for publishing site
  • ARROW-4334 - [CI] Setup conda-forge channel globally in travis builds
  • ARROW-4335 - [C++] Better document sparse tensor support
  • ARROW-4336 - [C++] Change default build type to RELEASE
  • ARROW-4339 - [C++][Python] Developer documentation overhaul for 0.13 release
  • ARROW-4340 - [C++][CI] Build IWYU for LLVM 7 in iwyu docker-compose job
  • ARROW-4341 - [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder<T>
  • ARROW-4344 - [Java] Further cleanup mvn output, upgrade rat plugin
  • ARROW-4345 - [C++] Add Apache 2.0 license file to the Parquet-testing repository
  • ARROW-4346 - [C++] Fix class-memaccess warning on gcc 8.x
  • ARROW-4352 - [C++] Add support for system Google Test
  • ARROW-4353 - [CI] Add MinGW builds
  • ARROW-4358 - [CI] Restore support for trusty in CI
  • ARROW-4361 - [Website] Update commiters list
  • ARROW-4362 - [Java] Test OpenJDK 11 in CI
  • ARROW-4363 - [CI][C++] Add CMake format checks
  • ARROW-4372 - [C++] Embed precompiled bitcode in the gandiva library
  • ARROW-4373 - [Packaging] Travis fails to deploy conda packages on OSX
  • ARROW-4375 - [CI] Sphinx dependencies were removed from docs conda environment
  • ARROW-4376 - [Rust] Implement from_buf_reader for csv::Reader
  • ARROW-4377 - [Rust] Implement std::fmt::Debug for PrimitiveArrays
  • ARROW-4379 - [Python] Register serializers for collections.Counter and collections.deque.
  • ARROW-4383 - [C++] Use the CMake's standard find features
  • ARROW-4386 - [Rust] Temporal array support
  • ARROW-4388 - [Go] add DimNames() method to tensor Interface
  • ARROW-4393 - [Rust] coding style: apply 90 characters per line limit
  • ARROW-4396 - [JS] Update Typedoc for TypeScript 3.2
  • ARROW-4397 - [C++] Add dim_names in Tensor and SparseTensor
  • ARROW-4399 - [C++] Do not use extern template class with NumericArray<T> and NumericTensor<T>
  • ARROW-4401 - [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency
  • ARROW-4406 - [Python] Exclude HDFS directories in S3 from ParquetManifest
  • ARROW-4408 - [CPP/Doc] Remove outdated Parquet documentation
  • ARROW-4422 - [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
  • ARROW-4423 - [C++] Upgrade vendored gmock/gtest to 1.8.1
  • ARROW-4424 - [Python] Install tensorflow and keras-preprocessing in manylinux1 container
  • ARROW-4425 - Add link to 'Contributing' page in the top-level Arrow README
  • ARROW-4430 - [C++] Fix untested TypedByteBuffer<T>::Append method
  • ARROW-4431 - [C++] Fixes for gRPC vendored builds
  • ARROW-4435 - Minor fixups to csharp .sln and .csproj file
  • ARROW-4436 - [Documentation] Update building.rst to reflect pyarrow req
  • ARROW-4442 - [JS] Add explicit type annotation to Chunked typeId getter
  • ARROW-4444 - [Testing] Add DataFusion test files to arrow-testing repo
  • ARROW-4445 - [C++][Gandiva] Run Gandiva-LLVM tests in Appveyor
  • ARROW-4446 - [C++][Python] Run Gandiva C++ unit tests in Appveyor, get build and tests working in Python
  • ARROW-4448 - [Java][Flight] Disable flaky TestBackPressure
  • ARROW-4449 - [Rust] Convert File to T: Read + Seek for schema inference
  • ARROW-4454 - [C++] fix unused parameter warnings
  • ARROW-4455 - [Plasma] Suppress class-memaccess warnings
  • ARROW-4459 - [Testing] Add arrow-testing repo as submodule
  • ARROW-4460 - [Website] DataFusion Blog Post
  • ARROW-4461 - [C++] Expose bit map operations that work with raw pointers
  • ARROW-4462 - [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
  • ARROW-4464 - [Rust][DataFusion] Add support for LIMIT
  • ARROW-4466 - [Rust][DataFusion] Add support for Parquet data source
  • ARROW-4468 - [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) (#3571)
  • ARROW-4472 - [Website][Python] Blog post about string memory use work in Arrow 0.12
  • ARROW-4475 - [Python] Fix recursive serialization of self-containing objects
  • ARROW-4476 - [Rust][DataFusion] Update README to cover DataFusion and new testing git submodule
  • ARROW-4481 - [Website] Remove generated specification docs from site after docs migration
  • ARROW-4483 - [Website] Add myself to contributors.yaml to fix broken link in blog post
  • ARROW-4485 - [CI] Determine maintenance approach to pinned conda-forge binutils package
  • ARROW-4486 - [Python][CUDA] Add base argument to foreign_buffer
  • ARROW-4488 - [Rust][u8] > for Buffer does not ensure correct padding
  • ARROW-4489 - [Rust] PrimitiveArray.value_slice performs bounds checking when it should not
  • ARROW-4490 - [Rust] Add explicit SIMD vectorization for boolean ops in "array_ops"
  • ARROW-4491 - [Python] Use StringConverter and stringstream instead of std::stoi and std::to_string
  • ARROW-4499 - [CI] Unpin flake8 in lint script, fix warnings in dev/
  • ARROW-4502 - [C#] Add support for zero-copy reads
  • ARROW-4506 - [Ruby] Add Arrow::RecordBatch#raw_records
  • ARROW-4513 - [Rust] Implement BitAnd/BitOr for &Bitmap
  • ARROW-4517 - [JS] remove version number as it is not used
  • ARROW-4518 - [JS] add jsdelivr to package.json
  • ARROW-4528 - [C++] Update lint docker container to LLVM-7
  • ARROW-4529 - [C++] Add test for BitUtil::RoundDown
  • ARROW-4531 - [C++] Support slices for SumKernel
  • ARROW-4537 - [CI] Suppress shell warning on travis-ci
  • ARROW-4539 - [Java] Fix child vector count for lists. (#3625)
  • ARROW-4540 - [Rust] Basic JSON reader
  • ARROW-4543 - [C#] Update Flat Buffers code to latest version
  • ARROW-4546 - Update LICENSE.txt with parquet-cpp licenses
  • ARROW-4547 - [Python][Documentation] Update python/development.rst with instructions for CUDA-enabled builds
  • ARROW-4556 - [Rust] Preserve JSON field order when inferring schema
  • ARROW-4558 - [C++][Flight] Implement gRPC customizations without UB
  • ARROW-4560 - [R] array() needs to take single input, not ...
  • ARROW-4562 - [C++] Avoid copies when serializing Flight data
  • ARROW-4564 - [C++] IWYU docker image silently fails
  • ARROW-4565 - [R] Fix decimal record batches with no null values
  • ARROW-4568 - [C++] Add version macros to headers
  • ARROW-4572 - [C++] Remove memory zeroing from PrimitiveAllocatingUnaryKernel
  • ARROW-4583 - [Plasma] Fix some small bugs reported by code scan tool
  • ARROW-4586 - [Rust] Remove arrow/mod.rs as it is not needed
  • ARROW-4589 - [Rust] Projection push down query optimizer rule
  • ARROW-4590 - [Rust] Add explicit SIMD vectorization for comparison ops in "array_ops"
  • ARROW-4592 - [GLib] Stop configure immediately when GLib isn't available
  • ARROW-4593 - [Ruby][out_of_range] returns nil
  • ARROW-4594 - [Ruby] returns Arrow::Struct instead of Arrow::Array
  • ARROW-4595 - [Rust] Implement Table API (a.k.a DataFrame)
  • ARROW-4598 - [CI] Remove needless LLVM_DIR for macOS
  • ARROW-4599 - [C++] Add support for system GFlags
  • ARROW-4602 - [Rust][DataFusion] Integrate query optimizer with ExecutionContext
  • ARROW-4603 - [Rust] [DataFusion] Execution context should allow in-memory data sources to be registered
  • ARROW-4604 - [Rust] [DataFusion] Add benchmarks for SQL query execution
  • ARROW-4605 - [Rust] Move filter and limit code from DataFusion into compute module
  • ARROW-4609 - [C++] Use google benchmark from toolchain
  • ARROW-4610 - [Plasma] Avoid Crash in Plasma Java Client
  • ARROW-4611 - [C++] Rework CMake logic
  • ARROW-4612 - [Python] Use cython from PyPI for windows wheels build
  • ARROW-4613 - [C++] Set CMAKE_INSTALL_LIBDIR in gtest thirdparty build
  • ARROW-4614 - [C++/CI] Activate flight build in ci/docker_build_cpp.sh
  • ARROW-4615 - [C++] Add checked_pointer_cast
  • ARROW-4616 - [C++] Log message in BuildUtils as STATUS
  • ARROW-4618 - [Docker] Makefile to build dependent docker images
  • ARROW-4619 - [R] Fix the autobrew script
  • ARROW-4620 - [C#] Add unit tests for "Types" in arrow/csharp
  • ARROW-4623 - [R] update Rcpp version
  • ARROW-4628 - [Rust][DataFusion] Implement type coercion query optimizer rule
  • ARROW-4632 - [Ruby] Add BigDecimal#to_arrow
  • ARROW-4634 - [Rust][Parquet] Reorganize test_common
  • ARROW-4637 - [Python] Conditionally import pandas symbols if they are used. Do not require pandas as a test dependency
  • ARROW-4638 - [R] install instructions using brew
  • ARROW-4640 - [Python] Add docker-compose configuration to build and test the project without pandas installed
  • ARROW-4643 - [C++] Force compiler diagnostic colors
  • ARROW-4644 - [C++/Docker] Build Gandiva in the docker containers
  • ARROW-4645 - [C++/Packaging] Ship Gandiva with OSX and Windows wheels
  • ARROW-4646 - [C++/Packaging] Ship gandiva with the conda-forge packages
  • ARROW-4655 - [Packaging] Parallelize binary upload
  • ARROW-4662 - [Python] Add support of type_codes in UnionType
  • ARROW-4667 - [C++] Suppress unused function warnings with MinGW
  • ARROW-4670 - [Rust] array_ops::sum performance optimizations
  • ARROW-4671 - [C++] MakeBuilder doesn't support Type::DICTIONARY
  • ARROW-4673 - [C++] Implement Scalar::Equals and Datum::Equals
  • ARROW-4676 - [C++] Add support for debug build with MinGW
  • ARROW-4678 - [Rust] Minimize unstable feature usage
  • ARROW-4679 - [Rust] Implement in-memory data source for DataFusion
  • ARROW-4681 - [Rust][DataFusion] Partition aware data sources
  • ARROW-4686 - [Dev] Only accept 'y' or 'n' in merge_arrow_pr.py prompts
  • ARROW-4689 - [Go] Add support for wasm
  • ARROW-4690 - Building TensorFlow compatible wheels for Arrow
  • ARROW-4692 - [Flight] Explain sidecar in a bit more detail
  • ARROW-4693 - [CI] Build boost with multiprecision
  • ARROW-4697 - [C++] Add URI parsing facility
  • ARROW-4703 - [C++] Upgrade dependency versions
  • ARROW-4705 - [Rust] Improve error handling in csv reader
  • ARROW-4707 - [C++] moving BitsetStack to BitUtil::
  • ARROW-4718 - [C#] Add ArrowStreamReader/Writer ctor with bool leaveOpen
  • ARROW-4727 - [Rust] Add equality check for schemas
  • ARROW-4730 - [C++] Add docker-compose entry for testing Fedora build with system packages
  • ARROW-4731 - [C++] Add docker-compose entry for testing Ubuntu Xenial build with system packages
  • ARROW-4732 - [C++] Add docker-compose entry for testing Debian Testing build with system packages
  • ARROW-4733 - [C++] Add CI entry that builds without the conda-forge toolchain but with system packages
  • ARROW-4734 - [Go] Add option to write a header for CSV writer
  • ARROW-4735 - [Go] Optimize CSV writer CPU/Mem performances
  • ARROW-4739 - [Rust] LogicalPlan can now be passed to threads
  • ARROW-4740 - [Java] Upgrade to JUnit 5.
  • ARROW-4743 - [Java] Add javadoc missing in classes and methods in java…
  • ARROW-4745 - [C++][Documentation] Document notes from replicating Static_Crt_Build on windows
  • ARROW-4749 - [Rust] Return Result for RecordBatch::new()
  • ARROW-4751 - [C++] Add pkg-config to conda_env_cpp.yml now that it's available on Windows
  • ARROW-4754 - [Java] Randomize port and retry binding server when bind fails
  • ARROW-4756 - Update readme for triggering docker builds
  • ARROW-4758 - [C++][Flight] Fix intermittent build failure
  • ARROW-4769 - [Rust] Improve array limit fn where max_records >= len
  • ARROW-4772 - [C++] new ORC adapter interface for stripe and row iteration
  • ARROW-4776 - [C++] Add DictionaryBuilder constructor which takes a dictionary array
  • ARROW-4777 - [C++/Python] manylinux1: Update lz4 to 1.8.3
  • ARROW-4778 - [C++/Python] manylinux1: Update Thrift to 0.12.0
  • ARROW-4782 - [C++] Prototype array and scalar expression types to help with building an deferred compute graph
  • ARROW-4786 - [C++/Python] Support better parallelisation in manylinux1 base build
  • ARROW-4789 - [C++] Deprecate and and later remove arrow::io::ReadableFileInterface
  • ARROW-4790 - [Python/Packaging] Update manylinux docker image in crossbow task
  • ARROW-4791 - [Rust] Remove unused dependencies
  • ARROW-4794 - [Python] Make pandas an optional test dependency
  • ARROW-4797 - [Plasma] Allow client to check store capacity and avoid server crash
  • ARROW-4801 - [GLib] Suppress Meson warnings
  • ARROW-4808 - [Java][Vector] More util methods to set decimal vector.
  • ARROW-4812 - [Rust] [DataFusion] Table.scan() should return one iterator per partition
  • ARROW-4817 - [Rust] [DataFusion] Small re-org of modules
  • ARROW-4818 - [Rust] [DataFusion] Parquet data source does not support null values
  • ARROW-4826 - [Go] export Flush method for CSV writer
  • ARROW-4831 - [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency
  • ARROW-4833 - [Release] Document how to update the brew formula in the release management guide
  • ARROW-4834 - [R] Feature flag when building parquet
  • ARROW-4835 - [GLib] Add boolean operations
  • ARROW-4837 - [C++] Support c++filt on a custom path in the run-test.sh script
  • ARROW-4839 - [C#] Add NuGet package metadata and instructions.
  • ARROW-4843 - [Rust] [DataFusion] Parquet data source should support DATE
  • ARROW-4846 - [Java] Upgrade to jackson 2.9.8
  • ARROW-4849 - [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages
  • ARROW-4854 - [Rust] Use zero-copy slice for limit kernel
  • ARROW-4855 - [Packaging] Generate default package version based on cpp tags in crossbow.py
  • ARROW-4858 - [Flight/Python] enable FlightDataStream to be implemented in Python
  • ARROW-4859 - [GLib] Add garrow_numeric_array_mean()
  • ARROW-4862 - [C++] Fix gcc warnings in CHECKIN
  • ARROW-4862 - [GLib] Add GArrowCastOptions::allow-invalid-utf8 property
  • ARROW-4865 - [Rust] Support list casts
  • ARROW-4873 - [C++] Clarify documentation about how to use external ARROW_PACKAGE_PREFIX while also using CONDA dependency resolution
  • ARROW-4878 - [C++] Append \Library to CONDA_PREFIX when using ARROW_DEPENDENCY_SOURCE=CONDA
  • ARROW-4882 - [GLib] Add sum functions
  • ARROW-4887 - [GLib] Add garrow_array_count()
  • ARROW-4889 - [C++] Add STATUS messages for Protobuf in CMake
  • ARROW-4891 - [C++] Add zlib headers to include directories
  • ARROW-4892 - [Rust][DataFusion] Move SQL parser and planner into SQL module
  • ARROW-4893 - [C++] conda packages should use inside of conda-build
  • ARROW-4894 - [Rust][DataFusion] Remove all uses of panic! from aggregate.rs
  • ARROW-4895 - [Rust][DataFusion] Move error.rs to root of crate
  • ARROW-4896 - [Rust][DataFusion] Remove all uses of panic! from DataFusion tests
  • ARROW-4897 - [Rust][DataFusion] Improve rustdocs
  • ARROW-4898 - [C++] Old versions of FindProtobuf.cmake use ALL-CAPS for variables
  • ARROW-4899 - [Rust][DataFusion] Remove panic from expression.rs
  • ARROW-4901 - [Go] add AppVeyor CI
  • ARROW-4905 - [C++][Plasma] Remove dlmalloc symbols from client library
  • ARROW-4907 - [CI] Add docker container to inspect docker context
  • ARROW-4908 - [Rust][DataFusion] Add support for date/time parquet types encoded as INT32/INT64
  • ARROW-4909 - [CI] Use hadolint to lint Dockerfiles
  • ARROW-4910 - [Rust][DataFusion] Remove all uses of unimplemented!
  • ARROW-4915 - [GLib][C++] Add arrow::NullBuilder support for GLib
  • ARROW-4922 - [Packaging] Use system libraries for .deb and .rpm
  • ARROW-4924 - [Ruby] Add Decimal128#to_s(scale=nil)
  • ARROW-4925 - [Rust] [DataFusion] Remove duplicate implementations of collect_expr
  • ARROW-4926 - [Rust][DataFusion] Update README for 0.13.0
  • ARROW-4929 - [GLib] Add garrow_array_count_values()
  • ARROW-4932 - [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
  • ARROW-4933 - [R] Autodetect Parquet support using pkg-config
  • ARROW-4937 - [R] Clean pkg-config related logic
  • ARROW-4939 - [Python] Add wrapper for "sum" kernel
  • ARROW-4940 - [Rust] Enable warnings for missing docs, add docs in datafusion
  • ARROW-4944 - [C++] Raise minimal required thrift-cpp to 0.11 in conda environment
  • ARROW-4946 - [C++] Support detection of flatbuffers without FlatbuffersConfig.cmake
  • ARROW-4947 - [Flight/C++] Remove redundant schema parameter to Flight client DoGet
  • ARROW-4951 - [C++] Turn off cpp benchmarks in cpp docker images
  • ARROW-4955 - [GLib] Add garrow_file_is_closed()
  • ARROW-4964 - [Ruby] Add closed check if available on auto close
  • ARROW-4969 - [C++] Set RPATH in correct order for test executables on OSX
  • ARROW-4977 - [Ruby] Add support for building on Windows
  • ARROW-4978 - [Ruby] Fix wrong internal variable name for table data
  • ARROW-4979 - [GLib] Add missing lock to garrow::GIOInputStream
  • ARROW-4980 - [GLib] Use GInputStream as the parent of GArrowInputStream
  • ARROW-4981 - [Ruby] Add support for CSV data encoding conversion
  • ARROW-4983 - [Plasma] Unmap memory upon destruction of the PlasmaClient
  • ARROW-4994 - [Website] Update details for ptgoetz
  • ARROW-4995 - [R] Support for winbuilder for CRAN checks
  • ARROW-4996 - [Plasma] Enable uninstalling of signal handler and fix log_dir
  • ARROW-5003 - [R] remove dependency on withr
  • ARROW-5006 - [R] parquet.cpp does not include enough Rcpp
  • ARROW-5011 - [Release] Add support in source release script for custom git hash
  • ARROW-5013 - [Rust][DataFusion] Refactor runtime expression support
  • ARROW-5014 - [Java] Fix typos in Flight module
  • ARROW-5018 - [Release] Include JavaScript implementation
  • ARROW-5032 - [C++] Install headers in vendored/datetime directory
  • ARROW-5041 - [C++] add GTest_SOURCE=BUNDLED to verify-release-candidate.bat
  • ARROW-5075 - [Release] Add 0.13.0 release note
  • ARROW-5084 - [Website] Blog post / release announcement for 0.13.0
  • PARQUET-1477 - [C++] sync thrift to final crypto spec
  • PARQUET-1508 - [C++] Read ByteArray data directly into arrow::BinaryBuilder and BinaryDictionaryBuilder. Refactor encoders/decoders to use cleaner virtual interfaces
  • PARQUET-1519 - [C++] Hide TypedColumnReader implementation behind virtual interfaces, remove use of "extern template class"
  • PARQUET-1521 - [C++] Use pure virtual interfaces for parquet::TypedColumnWriter, remove use of 'extern template class'
  • PARQUET-1525 - [C++] remove dependency on getopt in parquet tools
kszucs
published 0.4.1 •

Changelog

Source

Apache Arrow 0.4.1 (2017-06-09)

Bug Fixes

  • ARROW-424 - [C++] Make ReadAt, Write HDFS functions threadsafe
  • ARROW-1039 - Python: pyarrow.Filesystem.read_parquet causing error if nthreads>1
  • ARROW-1050 - [C++] Export arrow::ValidateArray
  • ARROW-1051 - [Python] Opt in to Parquet unit tests to avoid accidental suppression of dynamic linking errors
  • ARROW-1056 - [Python] Ignore pandas index in parquet+hdfs test
  • ARROW-1057 - Fix cmake warning and msvc debug asserts
  • ARROW-1060 - [Python] Add unit tests for reference counts in memoryview interface
  • ARROW-1062 - [GLib] Follow API changes in examples
  • ARROW-1066 - [Python] pandas 0.20.1 deprecation of pd.lib causes a warning on import
  • ARROW-1070 - [C++] Use physical types for Feather date/time types
  • ARROW-1075 - [GLib] Fix build error on macOS
  • ARROW-1082 - [GLib] Add CI on macOS
  • ARROW-1085 - [java] Follow up on template cleanup. Missing method for …
  • ARROW-1086 - include additional pxd files during package build
  • ARROW-1088 - [Python] Only test unicode filenames if system supports them
  • ARROW-1090 - Improve build_ext usability with --bundle-arrow-cpp
  • ARROW-1091 - Decimal scale and precision are flipped
  • ARROW-1092 - More Decimal and scale flipped follow-up
  • ARROW-1094 - [C++] Always truncate buffer read in ReadableFile::Read if actual number of bytes less than request
  • ARROW-1127 - pyarrow 4.1 import failure on Travis

New Features and Improvements

  • ARROW-897 - [GLib] Extract CI configuration for GLib
  • ARROW-986 - [Format] Add brief explanation of dictionary batches in IPC.md
  • ARROW-990 - [JS] Add tslint support for linting TypeScript
  • ARROW-1020 - [Format] Revise language for Timestamp type in Schema.fbs to avoid possible confusion about tz-naive timestamps
  • ARROW-1034 - [PYTHON] Resolve wheel build issues on Windows
  • ARROW-1049 - [java] vector template cleanup
  • ARROW-1063 - [Website] Updates for 0.4.0 release, release posting
  • ARROW-1068 - [Python] Create external repo with appveyor.yml configured for building Python wheel installers
  • ARROW-1069 - Add instructions for publishing maven artifacts
  • ARROW-1078 - [Python] Account for Apache Parquet shared library consolidation
  • ARROW-1080 - C++: Add tutorial about converting to/from row-wise representation
  • ARROW-1084 - Implementations of BufferAllocator should handle Netty's OutOfDirectMemoryError
  • ARROW-1118 - [Website] Site updates for 0.4.1
xhochy
published 0.4.0 •

Changelog

Source

Apache Arrow 0.4.0 (2017-05-22)

Bug Fixes

  • ARROW-813 - [Python] setup.py sdist must also bundle dependent cmake m…
  • ARROW-824 - Date and Time Vectors should reflect timezone-less semantics
  • ARROW-856 - Also read compiler info from stdout
  • ARROW-909 - Link jemalloc statically if build as external project
  • ARROW-939 - fix division by zero if one of the tensor dimensions is zero
  • ARROW-940 - [JS] Generate multiple artifacts
  • ARROW-944 - Python: Compat broken for pandas==0.18.1
  • ARROW-948 - [GLib] Update C++ header file list
  • ARROW-952 - fix regex include from C++ standard library
  • ARROW-958 - [Python] Fix conda source build instructions
  • ARROW-979 - [Python] Fix setuptools_scm version when release tag is not in the master timeline
  • ARROW-991 - [Python] Create new dtype when deserializing from Arrow to NumPy datetime64
  • ARROW-995 - [Website] Fix a typo
  • ARROW-998 - [Format] Clarify that the IPC file footer contains an additional copy of the schema
  • ARROW-1003 - [C++] Check flag _WIN32 instead of __WIN32
  • ARROW-1004 - [Python] Add conversions for numpy object arrays with integers and floats
  • ARROW-1017 - [Python] Fix memory leaks in conversion to pandas.DataFrame
  • ARROW-1023 - Python: Fix bundling of arrow-cpp for macOS
  • ARROW-1033 - [Python] pytest discovers scripts/test_leak.py
  • ARROW-1045 - [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.*
  • ARROW-1046 - [Python] Reconcile pandas metadata spec
  • ARROW-1053 - [Python] Remove unnecessary Py_INCREF in PyBuffer causing memory leak
  • ARROW-1054 - [Python] Test suite fails on pandas 0.19.2
  • ARROW-1061 - [C++] Harden decimal parsing against invalid strings
  • ARROW-1064 - ModuleNotFoundError: No module named 'pyarrow._parquet'

New Features and Improvements

  • ARROW-29 - [C++] FindRe2 cmake module
  • ARROW-182 - [C++] Factor out Array::Validate into a separate function
  • ARROW-376 - Python: Convert non-range Pandas indices (optionally) to Arrow
  • ARROW-446 - [Python] Expand Sphinx documentation for 0.3
  • ARROW-482 - [Java] Exposing custom field metadata
  • ARROW-532 - [Python] Expand pyarrow.parquet documentation for 0.3 release
  • ARROW-579 - Python: Provide redistributable pyarrow wheels on OSX
  • ARROW-596 - [Python] Add convenience function to convert pandas.DataFrame to pyarrow.Buffer containing a file or stream representation
  • ARROW-629 - [JS] Add unit test suite
  • ARROW-714 - [C++] Add import_pyarrow C API in the style of NumPy for thirdparty C++ users
  • ARROW-819 - Public Cython and C++ API in the style of lxml, arrow::py::import_pyarrow method
  • ARROW-872 - [JS] Read streaming format
  • ARROW-873 - [JS] Implement fixed width list type
  • ARROW-874 - [JS] Read dictionary-encoded vectors
  • ARROW-881 - [Python] Reconstruct Pandas DataFrame indexes using metadata
  • ARROW-891 - [Python] Expand Windows build instructions to not require looking at separate C++ docs
  • ARROW-899 - [Doc] Add 0.3.0 changelog
  • ARROW-901 - [Python] Add Parquet unit test for fixed size binary
  • ARROW-913 - [Python] Only link jemalloc to the Cython extension where it's needed
  • ARROW-923 - Changelog generation Python script, add 0.1.0 and 0.2.0 changelog
  • ARROW-929 - Remove KEYS file from git
  • ARROW-943 - [GLib] Support running unit tests with source archive
  • ARROW-945 - [GLib] Add a Lua example to show Torch integration
  • ARROW-946 - [GLib] Use "new" instead of "open" for constructor name
  • ARROW-947 - [Python] Improve execution time of manylinux1 build
  • ARROW-953 - Use conda-forge cmake, curl in CI toolchain
  • ARROW-954 - Flag for compiling Arrow with header-only boost
  • ARROW-956 - [Python] compat with pandas >= 0.20.0
  • ARROW-957 - [Doc] Add HDFS and Windows documents to doxygen output
  • ARROW-961 - [Python] Rename InMemoryOutputStream to BufferOutputStream
  • ARROW-963 - [GLib] Add equal
  • ARROW-967 - [GLib] Support initializing array with buffer
  • ARROW-970 - [Python] Nicer experience if user accidentally calls pyarrow.Table ctor directly
  • ARROW-977 - [java] Add Timezone aware timestamp vectors
  • ARROW-980 - Fix detection of "msvc" COMPILER_FAMILY
  • ARROW-982 - [Website] Improve website front copy to highlight serialization efficiency benefits
  • ARROW-984 - [GLib] Add Go examples
  • ARROW-985 - [GLib] Update package information
  • ARROW-988 - [JS] Add entry to Travis CI matrix
  • ARROW-993 - [GLib] Add missing error checks in Go examples
  • ARROW-996 - [Website] Add 0.3.0 release announce in Japanese
  • ARROW-997 - [Java] Implementing transferPair for FixedSizeListVector
  • ARROW-1000 - [GLib] Move install document to Website
  • ARROW-1001 - [GLib] Unify writer files
  • ARROW-1002 - [C++] Fix inconsistency with padding at start of IPC file format
  • ARROW-1008 - [C++] Add abstract stream writer and reader C++ APIs. Give clearer names to IPC reader/writer classes
  • ARROW-1010 - [Website] Provide for translations without repeating blog post in blogroll
  • ARROW-1011 - [FORMAT] fix typo and mistakes in Layout.md
  • ARROW-1014 - 0.4.0 release
  • ARROW-1015 - [Java] Schema-level metadata
  • ARROW-1016 - Python: Include C++ headers (optionally) in wheels
  • ARROW-1022 - [Python] Add multithreaded read option to read_feather
  • ARROW-1024 - Python: Update build time numpy version to 1.10.1
  • ARROW-1025 - [Website] Improved changelog for website, include git shortlog
  • ARROW-1027 - [Python] Allow negative indexing in fields/columns on pyarrow Table and Schema objects
  • ARROW-1028 - [Python] Fix IPC docs per API changes
  • ARROW-1029 - [Python] Fixes for building pyarrow with Parquet support on MSVC. Add to appveyor build
  • ARROW-1030 - Python: Account for library versioning in parquet-cpp
  • ARROW-1031 - [GLib] Support pretty print
  • ARROW-1037 - [GLib] Follow reader name change
  • ARROW-1038 - [GLib] Follow writer name change
  • ARROW-1040 - [GLib] Support tensor IO
  • ARROW-1044 - [GLib] Support Feather
  • ARROW-1126 - Python: Add function to convert NumPy/Pandas dtypes to Arrow DataTypes
wesm
published 0.3.1 •

ptaylor
published 0.3.0 •

Changelog

Source

Apache Arrow 0.3.0 (2017-05-05)

Bug Fixes

  • ARROW-109 - [C++] Add nesting stress tests up to 500 recursion depth
  • ARROW-208 - Add checkstyle policy to java project
  • ARROW-347 - Add method to pass CallBack when creating a transfer pair
  • ARROW-413 - DATE type is not specified clearly
  • ARROW-431 - [Python] Review GIL release and acquisition in to_pandas conversion
  • ARROW-443 - [Python] Support ingest of strided NumPy arrays from pandas
  • ARROW-451 - [C++] Implement DataType::Equals as TypeVisitor. Add default implementations for TypeVisitor, ArrayVisitor methods
  • ARROW-454 - pojo.Field doesn't implement hashCode()
  • ARROW-526 - [Format] Revise Format documents for evolution in IPC stream / file / tensor formats
  • ARROW-565 - [C++] Examine "Field::dictionary" member
  • ARROW-570 - Determine Java tools JAR location from project metadata
  • ARROW-584 - [C++] Fix compiler warnings exposed with -Wconversion
  • ARROW-586 - Problem with reading parquet files saved by Apache Spark
  • ARROW-588 - [C++] Fix some 32 bit compiler warnings
  • ARROW-595 - [Python] Set schema attribute on StreamReader
  • ARROW-604 - Python: boxed Field instances are missing the reference to their DataType
  • ARROW-611 - [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
  • ARROW-613 - WIP TypeScript Implementation
  • ARROW-617 - [Format] Add additional Time metadata and comments based on discussion in ARROW-617
  • ARROW-619 - [Python] Fixed remaining typo for LD_LIBRARY_PATH
  • ARROW-619 - Fix typos in setup.py args and LD_LIBRARY_PATH
  • ARROW-623 - Fix segfault in repr of empty field
  • ARROW-624 - [C++] Restore MakePrimitiveArray function, use in feather.cc
  • ARROW-627 - [C++] Add compatibility macros for exported extern templates
  • ARROW-628 - [Python] Install nomkl metapackage when building parquet-cpp in Travis CI
  • ARROW-630 - [C++] Create boolean batches for IPC testing, properly account for nonzero offset
  • ARROW-636 - [C++] Update README about Boost system requirement
  • ARROW-639 - [C++] Invalid offset in slices
  • ARROW-642 - [Java] Remove temporary file in java/tools
  • ARROW-644 - Python: Cython should be a setup-only requirement
  • ARROW-652 - Remove trailing f in merge script output
  • ARROW-654 - [C++] Serialize timezone in IPC metadata
  • ARROW-666 - [Python] Error in DictionaryArray __repr__
  • ARROW-667 - build of arrow-master/cpp fails with altivec error?
  • ARROW-668 - [Python] Box timestamp values as pandas.Timestamp if available, attach tzinfo
  • ARROW-671 - [GLib] Install missing license file
  • ARROW-673 - [Java] Support additional Time metadata
  • ARROW-677 - [java] Fix checkstyle jcl-over-slf4j conflict issue
  • ARROW-678 - [GLib] Fix dependencies
  • ARROW-680 - [C++] Support CMake 2 or older again
  • ARROW-682 - [Integration] Check implementations against themselves
  • ARROW-683 - [C++/Python] Refactor to make Date32 and Date64 types for new metadata. Test IPC roundtrip
  • ARROW-685 - [GLib] AX_CXX_COMPILE_STDCXX_11 error running ./configure
  • ARROW-686 - [C++] Account for time metadata changes, add Time32 and Time64 types
  • ARROW-689 - [GLib] Fix install directories
  • ARROW-691 - [Java] Encode dictionary type in message format
  • ARROW-697 - JAVA Throw exception for record batches > 2GB
  • ARROW-699 - [C++] Resolve Arrow and Arrow IPC build issues on Windows;
  • ARROW-702 - fix BitVector.copyFromSafe to reAllocate instead of returning false
  • ARROW-703 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-704 - Fix bad import caused by conflicting changes
  • ARROW-709 - [C++] Restore type comparator for DecimalType
  • ARROW-713 - [C++] Fix cmake linking issue in new IPC benchmark
  • ARROW-715 - [Python] Make pandas not a hard requirement, flake8 fixes
  • ARROW-716 - [Python] Update README build instructions after moving libpyarrow to C++ tree
  • ARROW-720 - arrow should not have a dependency on slf4j bridges in com…
  • ARROW-723 - [Python] Ensure that passing chunk_size=0 when writing Parquet file does not enter infinite loop
  • ARROW-726 - [C++] Fix segfault caused when passing non-buffer object to arrow::py::PyBuffer
  • ARROW-732 - [C++] Schema comparison bugs in struct and union types
  • ARROW-736 - [Python] Mixed-type object DataFrame columns should not silently co…
  • ARROW-738 - Fix manylinux1 build
  • ARROW-739 - Don't install jemalloc in parallel
  • ARROW-740 - FileReader fails for large objects
  • ARROW-747 - [C++] Calling add_dependencies with dl causes spurious CMake warning
  • ARROW-749 - [Python] Delete partially-written Feather file when column write fails
  • ARROW-753 - [Python] Fix linker error for python-test on OS X
  • ARROW-756 - [C++] MSVC build fixes and cleanup, remove -fPIC flag from EP builds on Windows, Dev docs
  • ARROW-757 - [C++] MSVC build fails on googletest when using NMake
  • ARROW-762 - [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS
  • ARROW-776 - [GLib] Fix wrong type name
  • ARROW-777 - restore getObject behavior on Date and Time
  • ARROW-778 - Port merge tool to work on Windows
  • ARROW-780 - PYTHON_EXECUTABLE Required to be set during build
  • ARROW-781 - [C++/Python] Increase reference count of the numpy base array?
  • ARROW-783 - [Java/C++] Fixes for 0-length record batches
  • ARROW-787 - [GLib] Fix compilation error caused by introducing BooleanBuilder::Append overload
  • ARROW-789 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-793 - [GLib] Fix indent
  • ARROW-794 - [C++/Python] Disallow strided tensors in ipc::WriteTensor
  • ARROW-796 - [Java] Checkstyle additions causing build failure in some environments
  • ARROW-797 - [Python] Make more explicitly curated public API page, sphinx cleanup
  • ARROW-800 - [C++] Boost headers being transitively included in pyarrow
  • ARROW-805 - [C++] Don't throw IOError when listing empty HDFS dir
  • ARROW-809 - [C++] Do not write excess bytes in IPC writer after slicing arrays
  • ARROW-812 - Pip install pyarrow on mac failed.
  • ARROW-817 - [Python] Fix comment in date32 conversion
  • ARROW-821 - [Python] Extra file _table_api.h generated during Python build process
  • ARROW-822 - [Python] StreamWriter Wrapper for Socket and File-like Objects without tell()
  • ARROW-826 - [C++/Python] Fix compilation error on Mac with -DARROW_PYTHON=on
  • ARROW-829 - Don't deactivate Parquet dictionary encoding on column-wis…
  • ARROW-830 - [Python] Expose jemalloc memory pool and other memory pool functions in public pyarrow API
  • ARROW-836 - add test for pandas conversion of timedelta, currently unimplemented
  • ARROW-839 - [Python] Use mktime variant that is reliable on MSVC
  • ARROW-847 - Specify BUILD_BYPRODUCTS for gtest
  • ARROW-852 - Also search for ARROW libs when pkg-config provided the path
  • ARROW-853 - [Python] Only set RPATH when bundling the shared libraries
  • ARROW-858 - Remove boost_regex from arrow dependencies
  • ARROW-866 - [Python] Be robust to PyErr_Fetch returning a null exc value
  • ARROW-867 - [Python] pyarrow MSVC fixes
  • ARROW-875 - Avoid setting an extra empty in fillEmpties()
  • ARROW-879 - compat with pandas v0.20.0
  • ARROW-882 - [C++] Rename statically build library on Windows to avoid …
  • ARROW-883 - [JAVA] Introduction of new types has shifted Enumerations
  • ARROW-885 - [Python/C++] Decimal test failure on MSVC
  • ARROW-886 - [Java] Fixing reallocation of VariableLengthVector offsets
  • ARROW-887 - add default value to units for backward compatibility
  • ARROW-888 - Transfer ownership of buffer in BitVector transferTo()
  • ARROW-895 - Fix lastSet in fillEmpties() and copyFrom()
  • ARROW-900 - [Python] Fix UnboundLocalError in ParquetDatasetPiece.read
  • ARROW-903 - [GLib] Remove a needless "."
  • ARROW-914 - [C++/Python] Fix Decimal ToBytes
  • ARROW-922 - Allow Flatbuffers and RapidJSON to be used locally on Windows
  • ARROW-927 - C++/Python: Add manylinux1 builds to Travis matrix
  • ARROW-928 - [C++] Detect supported MSVC versions
  • ARROW-933 - [Python] Remove debug print statement
  • ARROW-934 - [GLib] Glib sources missing from result of 02-source.sh
  • ARROW-936 - add missing file; revert tag change
  • ARROW-936 - fix release README
  • ARROW-938 - Fix Rat license warnings

New Features and Improvements

  • ARROW-6 - Hope to add development document
  • ARROW-39 - C++: Logical chunked arrays / columns: conforming to fixed chunk sizes
  • ARROW-52 - Set up project blog
  • ARROW-95 - Add Jekyll-based website publishing toolchain, migrate existing arrow-site
  • ARROW-98 - Java: API documentation
  • ARROW-99 - C++: Explore if RapidCheck may be helpful for testing / worth adding to toolchain
  • ARROW-183 - C++: Add storage type to DecimalType
  • ARROW-231 - [C++] : Add typed Resize to PoolBuffer
  • ARROW-281 - [C++] IPC/RPC support on Win32 platforms
  • ARROW-316 - [Format] Changes to Date metadata format per discussion in ARROW-316
  • ARROW-341 - [Python] Move pyarrow's C++ code to the main C++ source tree, install libarrow_python and headers
  • ARROW-452 - [C++/Python] Incorporate C++ and Python codebases for Feather file format
  • ARROW-459 - [C++] Dictionary IPC support in file and stream formats
  • ARROW-483 - [C++/Python] Provide access to "custom_metadata" Field attribute in IPC setting
  • ARROW-491 - [Format / C++] Add FixedWidthBinary type to format, C++ implementation
  • ARROW-492 - [C++] Add arrow/arrow.h public API
  • ARROW-493 - [C++] Permit large (length > INT32_MAX) arrays in memory
  • ARROW-502 - [C++/Python] : Logging memory pool
  • ARROW-510 - ARROW-582 ARROW-663 ARROW-729: [Java] Added units for Time and Date types, and integration tests
  • ARROW-518 - C++: Make Status::OK method constexpr
  • ARROW-520 - [C++] STL-compliant allocator
  • ARROW-528 - [Python] Utilize improved Parquet writer C++ API, add write_metadata function, test _metadata files
  • ARROW-534 - [C++] Add IPC tests for date/time after ARROW-452, fix bugs
  • ARROW-539 - [Python] Add support for reading partitioned Parquet files with Hive-like directory schemes
  • ARROW-542 - Adding dictionary encoding to FileWriter
  • ARROW-550 - [Format] Draft experimental Tensor flatbuffer message type
  • ARROW-552 - [Python] Implement getitem for DictionaryArray by returning a value from the dictionary
  • ARROW-557 - [Python] Add option to explicitly opt in to HDFS tests, do not implicitly skip
  • ARROW-563 - Support non-standard gcc version strings
  • ARROW-566 - Bundle Arrow libraries in Python package
  • ARROW-568 - [C++] Add default implementations for TypeVisitor, ArrayVisitor methods that return NotImplemented
  • ARROW-569 - [C++] Set version for *.pc
  • ARROW-574 - Python: Add support for nested Python lists in Pandas conversion
  • ARROW-576 - [C++] Complete file/stream implementation for union types
  • ARROW-577 - [C++] Use private implementation pattern in ipc::StreamWriter and ipc::FileWriter
  • ARROW-578 - [C++] Add -DARROW_CXXFLAGS=... option to make CMake more consistent
  • ARROW-580 - C++: Also provide jemalloc_X targets if only a static or shared version is found
  • ARROW-582 - [Java] Added JSON reader/writer unit test for date, time, and timestamp
  • ARROW-589 - C++: Use system provided shared jemalloc if static is unavailable
  • ARROW-591 - [C++] Add round trip testing fixture for JSON format
  • ARROW-593 - [C++] : Rename ReadableFileInterface to RandomAccessFile
  • ARROW-598 - [Python] Add support for converting pyarrow.Buffer to a memoryview with zero copy
  • ARROW-603 - [C++] Add RecordBatch::Validate method, call in RecordBatch ctor in debug builds
  • ARROW-605 - [C++] Refactor IPC adapter code into generic ArrayLoader class. Add Date32Type
  • ARROW-606 - [C++] upgrade flatbuffers version to 1.6.0
  • ARROW-608 - [Format] Days since epoch date type
  • ARROW-610 - [C++] Win32 compatibility in file.cc
  • ARROW-612 - [Java] Added not null to Field.toString output
  • ARROW-615 - [Java] Moved ByteArrayReadableSeekableByteChannel to src main o.a.a.vector.util
  • ARROW-616 - [C++] Do not include debug symbols in release builds by default
  • ARROW-618 - [Python/C++] Support timestamp+timezone conversion to pandas
  • ARROW-620 - [C++] Implement JSON integration test support for date, time, timestamp, fixed width binary
  • ARROW-621 - [C++] Start IPC benchmark suite for record batches, implement "inline" visitor. Code reorg
  • ARROW-625 - [C++] Add TimeUnit to TimeType::ToString. Add timezone to TimestampType::ToString if present
  • ARROW-626 - [Python] Replace PyBytesBuffer with zero-copy, memoryview-based PyBuffer
  • ARROW-631 - [GLib] Import
  • ARROW-632 - [Python] Add support for FixedWidthBinary type
  • ARROW-635 - [C++] Add JSON read/write support for FixedWidthBinary
  • ARROW-637 - [Format] Add timezone to Timestamp metadata, comments describing the semantics
  • ARROW-646 - [Python] Conda s3 robustness, set CONDA_PKGS_DIR env variable and add Travis CI caching
  • ARROW-647 - [C++] Use Boost shared libraries for tests and utilities
  • ARROW-648 - [C++] Support multiarch on Debian
  • ARROW-650 - [GLib] Follow ReadableFileInterface -> RnadomAccessFile change
  • ARROW-651 - [C++] Set version to shared library
  • ARROW-655 - [C++/Python] Implement DecimalArray
  • ARROW-656 - [C++] Add random access writer for a mutable buffer. Rename WriteableFileInterface to WriteableFile for better consistency
  • ARROW-657 - [C++/Python] Expose Tensor IPC in Python. Add equals method. Add pyarrow.create_memory_map/memory_map functions
  • ARROW-658 - [C++] Implement a prototype in-memory arrow::Tensor type
  • ARROW-659 - [C++] Add multithreaded memcpy implementation
  • ARROW-660 - [C++] Restore function that can read a complete encapsulated record batch message
  • ARROW-661 - [C++] Add LargeRecordBatch metadata type, IPC support, associated refactoring
  • ARROW-662 - [Format] Move Schema flatbuffers into their own file that can be included
  • ARROW-663 - [Java] Support additional Time metadata + vector value accessors
  • ARROW-664 - [C++] Make C++ Arrow serialization deterministic
  • ARROW-669 - [Python] Attach proper tzinfo when computing boxed scalars for TimestampArray
  • ARROW-670 - Arrow 0.3 release
  • ARROW-672 - [Format] Add MetadataVersion::V3 for Arrow 0.3
  • ARROW-674 - [Java] Support additional Timestamp timezone metadata
  • ARROW-675 - [GLib] Update package metadata
  • ARROW-676 - move from MinorType to FieldType in ValueVectors to carry all the relevant type bits
  • ARROW-679 - [Format] Change FieldNode, RecordBatch lengths to long, remove LargeRecordBatch. Refactoring
  • ARROW-681 - [C++] Disable boost's autolinking if shared boost is used …
  • ARROW-684 - [Python] More helpful error message if libparquet_arrow not built
  • ARROW-687 - [C++] Build and run full test suite in Appveyor
  • ARROW-688 - [C++] Use CMAKE_INSTALL_INCLUDEDIR for consistency
  • ARROW-690 - Only send JIRA updates to issues@arrow.apache.org
  • ARROW-698 - Add flag to FileWriter::WriteRecordBatch for writing record batches with lengths over INT32_MAX
  • ARROW-700 - Add headroom interface for allocator
  • ARROW-701 - [Java] Support Additional Date Type Metadata
  • ARROW-706 - [GLib] Add package install document
  • ARROW-707 - [Python] Return NullArray for array of all None in Array.from_pandas. Revert from_numpy -> from_pandas
  • ARROW-708 - [C++] Simplify metadata APIs to all use the Message class, perf analysis
  • ARROW-710 - [Python] Read/write with file-like Python objects from read_feather/write_feather
  • ARROW-711 - [C++] Remove extern template declarations for NumericArray<T> types
  • ARROW-712 - [C++] Reimplement Array::Accept as inline visitor
  • ARROW-717 - [C++] Implement IPC zero-copy round trip for tensors
  • ARROW-718 - [Python] Implement pyarrow.Tensor container, zero-copy NumPy roundtrips
  • ARROW-719 - [GLib] Release source archive
  • ARROW-722 - [Python] Support additional date/time types and metadata, conversion to/from NumPy and pandas.DataFrame
  • ARROW-724 - Add How to Contribute section to README
  • ARROW-725 - [Formats/Java] FixedSizeList message and java implementation
  • ARROW-727 - [Python] Ensure that NativeFile.write accepts any bytes, unicode, or object providing buffer protocol. Rename build_arrow_buffer to pyarrow.frombuffer
  • ARROW-728 - [C++/Python] Add Table::RemoveColumn method, remove name member, some other code cleaning
  • ARROW-729 - [Java] Add vector type for 32-bit date as days since UNIX epoch
  • ARROW-731 - [C++] Add shared library related versions to .pc
  • ARROW-733 - [C++/Python] Rename FixedWidthBinary to FixedSizeBinary for consistency with FixedSizeList
  • ARROW-734 - [C++/Python] Support building PyArrow on MSVC
  • ARROW-735 - [C++] Developer instruction document for MSVC on Windows
  • ARROW-737 - [C++] Enable mutable buffer slices, SliceMutableBuffer function
  • ARROW-741 - [Python] Switch Travis CI to use Python 3.6 instead of 3.5
  • ARROW-743 - [C++] Consolidate all but decimal array tests into array-test, collect some tests in type-test.cc
  • ARROW-744 - [GLib] Re-add an assertion for garrow_table_new() test
  • ARROW-745 - [C++] Allow use of system cpplint
  • ARROW-746 - [GLib] Add garrow_array_get_data_type()
  • ARROW-748 - [Python] Pin runtime library versions in conda-forge packages to force upgrades
  • ARROW-751 - [Python] Make all Cython modules private. Some code tidying
  • ARROW-752 - [Python] Support boxed Arrow arrays as input to DictionaryArray.from_arrays
  • ARROW-754 - [GLib] Add garrow_array_is_null()
  • ARROW-755 - [GLib] Add garrow_array_get_value_type()
  • ARROW-758 - [C++] Build with /WX in Appveyor, fix MSVC compiler warnings
  • ARROW-761 - [C++/Python] Add GetTensorSize method, Python bindings
  • ARROW-763 - C++: Use to find libpythonX.X.dylib
  • ARROW-765 - [Python] Add more natural Exception type hierarchy for thirdparty users
  • ARROW-768 - [Java] Change the "boxed" object representation of date and time types
  • ARROW-769 - [GLib] Support building without installed Arrow C++
  • ARROW-770 - [C++] Move .clang* files back into cpp source tree
  • ARROW-771 - [Python] Add read_row_group / num_row_groups to ParquetFile
  • ARROW-773 - [CPP] Add Table::AddColumn API
  • ARROW-774 - [GLib] Remove needless LICENSE.txt copy
  • ARROW-775 - add simple constructors to value vectors
  • ARROW-779 - [C++] Check for old metadata and raise exception if found
  • ARROW-782 - [C++] API cleanup, change public member access in DataType classes to functions, use class instead of struct
  • ARROW-788 - [C++] Align WriteTensor message
  • ARROW-795 - [C++] Consolidate arrow/arrow_io/arrow_ipc into a single shared and static library
  • ARROW-798 - [Docs] Publish Format Markdown documents somehow on arrow.apache.org
  • ARROW-802 - [GLib] Add read examples
  • ARROW-803 - [GLib] Update package repository URL
  • ARROW-804 - [GLib] Update build document
  • ARROW-806 - [GLib] Support add/remove a column from table
  • ARROW-807 - [GLib] Update "Since" tag
  • ARROW-808 - [GLib] Remove needless ignore entries
  • ARROW-810 - [GLib] Remove io/ipc prefix
  • ARROW-811 - [GLib] Add GArrowBuffer
  • ARROW-815 - [Java] Exposing reAlloc for ValueVector
  • ARROW-816 - [C++] Travis CI script cleanup, add C++ toolchain env with Flatbuffers, RapidJSON
  • ARROW-818 - [Python] Expand Sphinx API docs, pyarrow.* namespace. Add factory functions for time32, time64
  • ARROW-820 - [C++] Build dependencies for Parquet library without arrow…
  • ARROW-825 - [Python] Rename pyarrow.from_pylist to pyarrow.array, test on tuples
  • ARROW-827 - [Python] Miscellaneous improvements to help with Dask support
  • ARROW-828 - [C++] Add new dependency to README
  • ARROW-831 - Switch from boost::regex to std::regex
  • ARROW-832 - [C++] Update to gtest 1.8.0, remove now unneeded test_main.cc
  • ARROW-833 - [Python] Add Developer quickstart for conda users
  • ARROW-841 - [Python] Add pyarrow build to Appveyor
  • ARROW-844 - [Format] Update README documents in format/
  • ARROW-845 - [Python] Sync changes from PARQUET-955; explicit ARROW_HOME will override pkgconfig
  • ARROW-846 - [GLib] Add GArrowTensor, GArrowInt8Tensor and GArrowUInt8Tensor
  • ARROW-848 - [Python] Another pass on conda dev guide
  • ARROW-849 - [C++] Support setting production build dependencies with ARROW_BUILD_TOOLCHAIN
  • ARROW-857 - [Python] Automate publishing Python documentation to arrow-site
  • ARROW-859 - [C++] Do not build unit tests by default?
  • ARROW-860 - [C++] Remove typed Tensor containers
  • ARROW-861 - [Python] Move DEVELOPMENT.md to Sphinx docs
  • ARROW-862 - [Python] Simplify README landing documentation to direct users and developers toward the documentation
  • ARROW-863 - [GLib] Use GBytes to implement zero-copy
  • ARROW-864 - [GLib] Unify Array files
  • ARROW-865 - [Python] Add unit tests validating Parquet date/time type roundtrips
  • ARROW-868 - [GLib] Use GBytes to reduce copy
  • ARROW-869 - [JS] Rename directory to js/
  • ARROW-871 - [GLib] Unify DataType files
  • ARROW-876 - [GLib] Unify ArrayBuilder files
  • ARROW-877 - [GLib] Add garrow_array_get_null_bitmap()
  • ARROW-878 - [GLib] Add garrow_binary_array_get_buffer()
  • ARROW-880 - [GLib] Support getting raw data of primitive arrays
  • ARROW-890 - [GLib] Add GArrowMutableBuffer
  • ARROW-892 - [GLib] Fix GArrowTensor document
  • ARROW-893 - Add GLib document to Web site
  • ARROW-894 - [GLib] Add GArrowResizableBuffer and GArrowPoolBuffer
  • ARROW-896 - Support Jupyter Notebook in Web site
  • ARROW-898 - [C++/Python] Use shared_ptr to avoid copying KeyValueMetadata, add to Field type also
  • ARROW-904 - [GLib] Simplify error check codes
  • ARROW-907 - C++: Construct Table from schema and arrays
  • ARROW-908 - [GLib] Unify OutputStream files
  • ARROW-910 - [C++] Write 0 length at EOS in StreamWriter
  • ARROW-916 - [GLib] Add GArrowBufferOutputStream
  • ARROW-917 - [GLib] Add GArrowBufferReader
  • ARROW-918 - [GLib] Use GArrowBuffer for read buffer
  • ARROW-919 - [GLib] Use "id" to get type enum value from GArrowDataType
  • ARROW-920 - [GLib] Add Lua examples
  • ARROW-925 - [GLib] Fix GArrowBufferReader test
  • ARROW-926 - Add wesm to KEYS
  • ARROW-930 - javadoc generation fails with java 8
  • ARROW-931 - [GLib] Reconstruct input stream
  • ARROW-965 - Website updates for 0.3.0 release
wesm
published 0.2.0 •

Changelog

Source

Apache Arrow 0.2.0 (2017-02-18)

Bug Fixes

  • ARROW-112 - Changed constexprs to kValue naming.
  • ARROW-202 - Integrate with appveyor ci for windows
  • ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility
  • ARROW-224 - [C++] Address static linking of boost dependencies
  • ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io)
  • ARROW-239 - Test reading remainder of file in HDFS with read() with no args
  • ARROW-261 - Refactor String/Binary code paths to reflect unnested (non-list-based) structure
  • ARROW-273 - Lists use unsigned offset vectors instead of signed (as defined in the spec)
  • ARROW-275 - Add tests for UnionVector in Arrow File
  • ARROW-294 - [C++] Do not use platform-dependent fopen/fclose functions for MemoryMappedFile
  • ARROW-322 - [C++] Remove ARROW_HDFS option, always build the module
  • ARROW-323 - [Python] Opt-in to pyarrow.parquet extension rather than attempting and failing silently
  • ARROW-334 - [Python] Remove INSTALL_RPATH_USE_LINK_PATH
  • ARROW-337 - UnionListWriter.list() is doing more than it should, this …
  • ARROW-339 - [Dev] Lingering Python 3 fixes
  • ARROW-339 - Python 3 compatibility in merge_arrow_pr.py
  • ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero
  • ARROW-342 - Set Python version on release
  • ARROW-345 - libhdfs integration doesn't work for Mac
  • ARROW-346 - Use conda environment to build API docs
  • ARROW-348 - [Python] Add build-type command line option to setup.py, build CMake extensions in a build type subdirectory
  • ARROW-349 - Add six as a requirement
  • ARROW-351 - Time type has no unit
  • ARROW-354 - Fix comparison of arrays of empty strings
  • ARROW-357 - Use a single RowGroup for Parquet files as default.
  • ARROW-358 - Add explicit environment variable to locate libhdfs in one's environment
  • ARROW-362 - Fix memory leak in zero-copy arrow to NumPy/pandas conversion
  • ARROW-371 - Handle pandas-nullable types correctly
  • ARROW-375 - Fix unicode Python 3 issue in columns argument of parquet.read_table
  • ARROW-384 - Align Java and C++ RecordBatch data and metadata layout
  • ARROW-386 - [Java] Respect case of struct / map field names
  • ARROW-387 - [C++] Verify zero-copy Buffer slices from BufferReader retain reference to parent Buffer
  • ARROW-390 - Only specify dependencies for json-integration-test on ARROW_BUILD_TESTS=ON
  • ARROW-392 - [C++/Java] String IPC integration testing / fixes. Add array / record batch pretty-printing
  • ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector
  • ARROW-395 - Arrow file format writes record batches in reverse order.
  • ARROW-398 - Java file format requires bitmaps of all 1's to be written…
  • ARROW-399 - ListVector.loadFieldBuffers ignores the ArrowFieldNode len…
  • ARROW-400 - set struct length on load
  • ARROW-401 - Floating point vectors should do an approximate comparison…
  • ARROW-402 - Fix reference counting issue with empty buffers. Close #232
  • ARROW-403 - [Java] Create transfer pairs for internal vectors in UnionVector transfer impl
  • ARROW-404 - [Python] Fix segfault caused by HdfsClient getting closed before an HdfsFile
  • ARROW-405 - Use vendored hdfs.h if not found in include/ in $HADOOP_HOME
  • ARROW-406 - [C++] Set explicit 64K HDFS buffer size, test large reads
  • ARROW-408 - Remove defunct conda recipes
  • ARROW-414 - [Java] "Buffer too large to resize to ..." error
  • ARROW-420 - Align DATE type with Java implementation
  • ARROW-421 - [Python] Retain parent reference in PyBytesReader
  • ARROW-422 - IPC should depend on rapidjson_ep if RapidJSON is vendored
  • ARROW-429 - Revert ARROW-379 until git-archive issues are resolved
  • ARROW-433 - Correctly handle Arrow to Python date conversion for timezones west of London
  • ARROW-434 - [Python] Correctly handle Python file objects in Parquet read/write paths
  • ARROW-435 - Fix spelling of RAPIDJSON_VENDORED
  • ARROW-437 - [C++} Fix clang compiler warning
  • ARROW-445 - arrow_ipc_objlib depends on Flatbuffer generated files
  • ARROW-447 - Always return unicode objects for UTF-8 strings
  • ARROW-455 - [C++] Add dtor to BufferOutputStream that calls Close()
  • ARROW-469 - C++: Add option so that resize doesn't decrease the capacity
  • ARROW-481 - [Python] Fix 2.7 regression in Parquet path to open file code path
  • ARROW-486 - [C++] Use virtual inheritance for diamond inheritance
  • ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails
  • ARROW-494 - [C++] Extend lifetime of memory mapped data if any buffers reference it
  • ARROW-499 - Update file serialization to use the streaming serialization format.
  • ARROW-505 - [C++] Fix compiler warning in gcc in release mode
  • ARROW-511 - Python: Implement List conversions for single arrays
  • ARROW-513 - [C++] Fixing Appveyor / MSVC build
  • ARROW-516 - Building pyarrow with parquet
  • ARROW-519 - [C++] Refactor array comparison code into a compare.h / compare.cc in part to resolve Xcode 6.1 linker issue
  • ARROW-523 - Python: Account for changes in PARQUET-834
  • ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor
  • ARROW-535 - [Python] Add type mapping for NPY_LONGLONG
  • ARROW-537 - [C++] Do not compare String/Binary data in null slots when comparing arrays
  • ARROW-540 - [C++] Build fixes after ARROW-33, PARQUET-866
  • ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries
  • ARROW-544 - [C++] Test writing zero-length record batches, zero-length BinaryArray fixes
  • ARROW-545 - [Python] Ignore non .parq/.parquet files when reading directories as Parquet datasets
  • ARROW-548 - [Python] Add nthreads to Filesystem.read_parquet and pass through
  • ARROW-551 - C++: Construction of Column with nullptr Array segfaults
  • ARROW-556 - [Integration] Configure C++ integration test executable with a single environment variable. Update README
  • ARROW-561 - [JAVA][PYTHON] Update java & python dependencies to improve downstream packaging experience
  • ARROW-562 - Mockito should be in test scope

New Features and Improvements

  • ARROW-33 - [C++] Implement zero-copy array slicing, integrate with IPC code paths
  • ARROW-81 - [Format] Augment dictionary encoding metadata to accommodate additional use cases
  • ARROW-96 - Add C++ API documentation
  • ARROW-97 - API documentation via sphinx-apidoc
  • ARROW-108 - [C++] Add Union implementation and IPC/JSON serialization tests
  • ARROW-189 - Build 3rd party with ExternalProject.
  • ARROW-191 - Python: Provide infrastructure for manylinux1 wheels
  • ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types
  • ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet
  • ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects
  • ARROW-240 - Installation instructions for pyarrow
  • ARROW-243 - [C++] Add option to switch between libhdfs and libhdfs3 when creating HdfsClient
  • ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC
  • ARROW-303 - [C++] Also build static libraries for leaf libraries
  • ARROW-312 - [Java] IPC file round trip tool for integration testing
  • ARROW-312 - Read and write Arrow IPC file format from Python
  • ARROW-317 - Add Slice, Copy methods to Buffer
  • ARROW-327 - [Python] Remove conda builds from Travis CI setup
  • ARROW-328 - Return shared_ptr<T> by value instead of const-ref
  • ARROW-330 - CMake functions to simplify shared / static library configuration
  • ARROW-332 - Add RecordBatch.to_pandas method
  • ARROW-333 - Make writers update their internal schema even when no data is written
  • ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better
  • ARROW-336 - Run Apache Rat in Travis builds
  • ARROW-338 - Implement visitor pattern for IPC loading/unloading
  • ARROW-344 - Instructions for building with conda
  • ARROW-350 - Added Kerberos to HDFS client
  • ARROW-353 - Arrow release 0.2
  • ARROW-355 - Add tests for serialising arrays of empty strings to Parquet
  • ARROW-356 - Add documentation about reading Parquet
  • ARROW-359 - Document ARROW_LIBHDFS_DIR
  • ARROW-360 - C++: Add method to shrink PoolBuffer using realloc
  • ARROW-361 - Python: Support reading a column-selection from Parquet files
  • ARROW-363 - [Java/C++] integration testing harness, initial integration tests
  • ARROW-365 - Python: Provide Array.to_pandas()
  • ARROW-366 - Java Dictionary Vector
  • ARROW-367 - converter json <=> Arrow file format for Integration tests
  • ARROW-368 - Added note for LD_LIBRARY_PATH in Python README
  • ARROW-369 - [Python] Convert multiple record batches at once to Pandas
  • ARROW-370 - Python: Pandas conversion from `datetime.date` columns
  • ARROW-372 - json vector serialization format
  • ARROW-373 - [C++] JSON serialization format for testing
  • ARROW-374 - More precise handling of bytes vs unicode in Python API
  • ARROW-377 - Python: Add support for conversion of Pandas.Categorical
  • ARROW-379 - Use setuptools_scm for Python versioning
  • ARROW-380 - [Java] optimize null count when serializing vectors
  • ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton
  • ARROW-382 - Extend Python API documentation
  • ARROW-383 - [C++] Integration testing CLI tool
  • ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects
  • ARROW-394 - [Integration] Generate tests cases for numeric types, strings, lists, structs
  • ARROW-396 - [Python] Add pyarrow.schema.Schema.equals
  • ARROW-409 - [Python] Change record batches conversion to Table
  • ARROW-410 - [C++] Add virtual Writeable::Flush
  • ARROW-411 - [Java] Move compactor functions in Integration to a separate Validator module
  • ARROW-415 - C++: Add Equals implementation to compare Tables
  • ARROW-416 - C++: Add Equals implementation to compare Columns
  • ARROW-417 - Add Equals implementation to compare ChunkedArrays
  • ARROW-418 - [C++] Array / Builder class code reorganization, flattening
  • ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory
  • ARROW-423 - Define BUILD_BYPRODUCTS for CMake 3.2+
  • ARROW-425 - Add private API to get python Table from a C++ object
  • ARROW-426 - Python: Conversion from pyarrow.Array to a Python list
  • ARROW-427 - [C++] Implement dictionary array type
  • ARROW-428 - [Python] Multithreaded conversion from Arrow table to pandas.DataFrame
  • ARROW-430 - Improved version handling
  • ARROW-432 - [Python] Construct precise pandas BlockManager structure for zero-copy DataFrame initialization
  • ARROW-438 - [C++/Python] Implement zero-data-copy record batch and table concatenation.
  • ARROW-440 - [C++] Support pkg-config
  • ARROW-441 - [Python] Expose Arrow's file and memory map classes as NativeFile subclasses
  • ARROW-442 - [Python] Inspect Parquet file metadata from Python
  • ARROW-444 - [Python] Native file reads into pre-allocated memory. Some IO API cleanup / niceness
  • ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict
  • ARROW-450 - Fixes for PARQUET-818
  • ARROW-456 - Add jemalloc based MemoryPool
  • ARROW-457 - Python: Better control over memory pool
  • ARROW-458 - [Python] Expose jemalloc MemoryPool
  • ARROW-461 - [Python] Add Python interfaces to DictionaryArray data, pandas interop
  • ARROW-463 - C++: Support jemalloc 4.x
  • ARROW-466 - Add ExternalProject for jemalloc
  • ARROW-467 - [Python] Run Python parquet-cpp unit tests in Travis CI
  • ARROW-468 - Python: Conversion of nested data in pd.DataFrames
  • ARROW-470 - [Python] Add "FileSystem" abstraction to access directories of files in a uniform way
  • ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata
  • ARROW-472 - [Python] Expose more C++ IO interfaces. Add equals methods to Parquet schemas. Pass Parquet metadata separately in reader
  • ARROW-474 - [Java] Add initial version of streaming serialized format.
  • ARROW-475 - [Python] Add support for reading multiple Parquet files as a single pyarrow.Table
  • ARROW-476 - Add binary integration test fixture, add Java support
  • ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer
  • ARROW-478 - Consolidate BytesReader and BufferReader to accept PyBytes or Buffer
  • ARROW-479 - Python: Test for expected schema in Pandas conversion
  • ARROW-484 - Revise README to include more detail about software components
  • ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe
  • ARROW-490 - Python: Update manylinux1 build scripts
  • ARROW-495 - [C++] Implement streaming binary format, refactoring
  • ARROW-497 - Integration harness for streaming file format
  • ARROW-498 - [C++] Add command line utilities that convert between stream and file.
  • ARROW-503 - [Python] Implement Python interface to streaming file format
  • ARROW-506 - Java: Implement echo server for integration testing.
  • ARROW-508 - [C++] Add basic threadsafety to normal files and memory maps
  • ARROW-509 - [Python] Add support for multithreaded Parquet reads
  • ARROW-512 - C++: Add method to check for primitive types
  • ARROW-514 - [Python] Automatically wrap pyarrow.io.Buffer in BufferReader
  • ARROW-515 - [Python] Add read_all methods to FileReader, StreamReader
  • ARROW-521 - [C++] Track peak allocations in default memory pool
  • ARROW-524 - provide apis to access nested vectors and buffers
  • ARROW-525 - Python: Add more documentation to the package
  • ARROW-527 - Remove drill-module.conf file
  • ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build
  • ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved
  • ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds
  • ARROW-546 - Python: Account for changes in PARQUET-867
  • ARROW-547 - [Python] Add zero-copy slice methods to Array, RecordBatch
  • ARROW-553 - C++: Faster valid bitmap building
  • ARROW-558 - Add KEYS files
ptaylor
published 0.1.2 •

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc