Apache Arrow 0.1.0 (2016-10-10)
New Features and Improvements
- ARROW-1 - Initial Arrow Code Commit
- ARROW-2 - Post Simple Website
- ARROW-3 - This patch includes a WIP draft specification document for the physical Arrow memory layout produced over a series of discussions amongst the to-be Arrow committers during late 2015. There are also a few small PNG diagrams that illustrate some of the Arrow layout concepts.
- ARROW-4 - This provides an partial C++11 implementation of the Apache Arrow data structures along with a cmake-based build system. The codebase generally follows Google C++ style guide, but more cleaning to be more conforming is needed. It uses googletest for unit testing.
- ARROW-7 - Add barebones Python library build toolchain
- ARROW-8 - Add .travis.yml and test script for Arrow C++. OS X build fixes
- ARROW-9 - Rename some unchanged "Drill" to "Arrow" (follow-up)
- ARROW-9 - Replace straggler references to Drill
- ARROW-10 - Fix mismatch of javadoc names and method parameters
- ARROW-11 - Mirror JIRA activity to dev@arrow.apache.org
- ARROW-13 - Add PR merge tool from parquet-mr, suitably modified
- ARROW-14 - Add JIRA components
- ARROW-15 - Fix a naming typo for memory.AllocationManager.AllocationOutcome
- ARROW-19 - Add an externalized MemoryPool interface for use in builder classes
- ARROW-20 - Add null_count_ member to array containers, remove nullable_ member
- ARROW-21 - Implement a simple in-memory Schema data structure
- ARROW-22 - [C++] Convert flat Parquet schemas to Arrow schemas
- ARROW-23 - Add a logical Column data structure
- ARROW-24 - C++: Implement a logical Table container type
- ARROW-26 - Add instructions for enabling Arrow C++ Parquet adapter build
- ARROW-28 - Adding google's benchmark library to the toolchain
- ARROW-30 - [Python] Routines for converting between arrow::Array/Table and pandas.DataFrame
- ARROW-31 - Python: prototype user object model, add PyList conversion path with type inference
- ARROW-35 - Add a short call-to-action in the top level README.md
- ARROW-37 - [C++ / Python] Implement BooleanArray and BooleanBuilder. Handle Python built-in bool
- ARROW-42 - Add Python tests to Travis CI build
- ARROW-43 - Python: format array values to in repr for interactive computing
- ARROW-44 - Python: prototype object model for array slot values ("scalars")
- ARROW-48 - Python: Add Schema object wrapper
- ARROW-49 - [Python] Add Column and Table wrapper interface
- ARROW-50 - C++: Enable library builds for 3rd-party users without having to build thirdparty googletest
- ARROW-53 - Python: Fix RPATH and add source installation instructions
- ARROW-54 - [Python] Rename package to "pyarrow"
- ARROW-56 - Format: Specify LSB bit ordering in bit arrays
- ARROW-57 - Format: Draft data headers IDL for data interchange
- ARROW-58 - Format: Draft type metadata ("schemas") IDL
- ARROW-59 - Python: Boolean data support for builtin data structures
- ARROW-60 - [C++] Struct type builder API
- ARROW-64 - Add zsh support to C++ build scripts
- ARROW-66 - Maybe some missing steps in installation guide
- ARROW-67 - C++ metadata flatbuffer serialization and data movement to memory maps
- ARROW-68 - Better error handling for not fully setup systems
- ARROW-70 - Add adapt 'lite' DCHECK macros from Kudu as also used in Parquet
- ARROW-71 - [C++] Add clang-tidy and clang-format to the tool chain.
- ARROW-73 - Support older CMake versions
- ARROW-76 - Revise format document to include null count, defer non-nullable arrays to the domain of metadata
- ARROW-78 - C++: Add constructor for DecimalType
- ARROW-79 - [Python] Add benchmarks
- ARROW-82 - Initial IPC support for ListArray
- ARROW-85 - memcmp can be avoided in Equal when comparing with the same …
- ARROW-86 - [Python] Implement zero-copy Arrow-to-Pandas conversion
- ARROW-87 - [C++] Add all four possible ways to encode Decimals in Parquet to schema conversion
- ARROW-89 - [Python] Add benchmarks for Arrow<->Pandas conversion
- ARROW-90 - [C++] Check for SIMD instruction set support
- ARROW-91 - Basic Parquet read support
- ARROW-92 - Arrow to Parquet Schema conversion
- ARROW-100 - [C++] Computing RowBatch size
- ARROW-101 - Fix java compiler warnings
- ARROW-102 - travis-ci support for java project
- ARROW-106 - [C++] Add IPC to binary/string types
- ARROW-107 - [C++] Implement IPC for structs
- ARROW-190 - Python: Provide installable sdist builds
- ARROW-196 - [C++] Add conda dev recipe for libarrow and libarrow_parquet
- ARROW-197 - Working first draft of a conda recipe for pyarrow
- ARROW-199 - [C++] Refine third party dependency
- ARROW-201 - [C++] Initial ParquetWriter implementation
- ARROW-203 - Python: Basic filename based Parquet read/write
- ARROW-204 - Add Travis CI builds that post conda artifacts for Linux and OS X
- ARROW-206 - Expose a C++ api to compare ranges of slots between two arrays
- ARROW-207 - Extend BufferAllocator interface to allow decorators around BufferAllocator
- ARROW-212 - Change contract of PrimitiveArray to reflect its abstractness
- ARROW-213 - Exposing static arrow build
- ARROW-214 - C++: Add String support to Parquet I/O
- ARROW-215 - Support other integer types and strings in Parquet I/O
- ARROW-218 - Add optional API token authentication option to PR merge tool
- ARROW-222 - Prototyping an IO interface for Arrow, with initial HDFS target
- ARROW-233 - Add visibility macros, add static build option
- ARROW-234 - Build libhdfs IO extension in conda artifacts
- ARROW-236 - Bridging IO interfaces under the hood in pyarrow
- ARROW-237 - Implement parquet-cpp's abstract IO interfaces for memory allocation and file reading
- ARROW-238 - Change InternalMemoryPool::Free() to return Status::Invalid when ther…
- ARROW-242 - Support Timestamp Data Type
- ARROW-245 - add endianness to RecordBatch
- ARROW-251 - Expose APIs for getting code and message of the status
- ARROW-252 - Add implementation guidelines to the documentation
- ARROW-253 - restrict ints to 8, 16, 32, or 64 bits in V1
- ARROW-254 - remove Bit type as it is redundant with Boolean
- ARROW-255 - Finalize Dictionary representation
- ARROW-256 - [Format] Add a version number to the IPC/RPC metadata
- ARROW-257 - Add a typeids Vector to Union type
- ARROW-262 - Start metadata specification document
- ARROW-264 - File format
- ARROW-267 - [C++] Implement file format layout for IPC/RPC
- ARROW-270 - Define more generic Interval logical type
- ARROW-271 - Update Field structure to be more explicit
- ARROW-272 - Arrow release 0.1
- ARROW-279 - rename vector module to arrow-vector
- ARROW-280 - [C++] Refactor IPC / memory map IO to use common arrow_io interfaces. Create arrow_ipc leaf library
- ARROW-282 - Make parquet-cpp an optional dependency of pyarrow
- ARROW-285 - Optional flatc download
- ARROW-286 - Build thirdparty dependencies in parallel
- ARROW-289 - Install test-util.h
- ARROW-290 - Specialize alloc() in ArrowBuf
- ARROW-291 - [Python] Update NOTICE file for Python codebase
- ARROW-292 - [Java] Upgrade Netty to 4.0.41
- ARROW-293 - [C++] Implement Arrow IO interfaces for operating system files
- ARROW-296 - [Python / C++] Remove arrow::parquet, make pyarrow link against parquet_arrow
- ARROW-298 - create release scripts
- ARROW-299 - Use absolute namespace in macros
- ARROW-301 - Add user field metadata to IPC schemas
- ARROW-302 - [C++/Python] Implement C++ IO interfaces for interacting with Python file and bytes objects
- ARROW-305 - Add compression and use_dictionary options to Parquet
- ARROW-306 - Add option to pass cmake arguments via environment variable
- ARROW-315 - finalize timestamp
- ARROW-318 - Revise python/README.md given recent changes in codebase
- ARROW-319 - Add canonical Arrow Schema json representation
- ARROW-324 - Update arrow metadata diagram
- ARROW-325 - make TestArrowFile not dependent on timezone
Bug Fixes
- ARROW-5 - Correct Apache Maven repo for maven plugin use
- ARROW-5 - Update drill-fmpp-maven-plugin to 1.5.0
- ARROW-16 - Building cpp issues on XCode 7.2.1
- ARROW-17 - set some vector fields to package level access for Drill compatibility
- ARROW-18 - Fix decimal precision and scale in MapWriters
- ARROW-36 - Remove fixVersions from JIRA resolve code path
- ARROW-46 - ListVector should initialize bits in allocateNew
- ARROW-51 - Add simple ValueVector tests
- ARROW-55 - [Python] Fix unit tests in 2.7
- ARROW-62 - Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction
- ARROW-63 - [C++] Enable ctest to work on systems with Python 3 as the default Python
- ARROW-65 - Be less restrictive on PYTHON_LIBRARY search paths
- ARROW-69 - Change permissions for assignable users
- ARROW-72 - Search for alternative parquet-cpp header
- ARROW-75 - Fix handling of empty strings
- ARROW-77 - [C++] Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls
- ARROW-80 - Handle len call for pre-init arrays
- ARROW-83 - [C++] Add basic test infrastructure for DecimalType
- ARROW-84 - C++: separate test codes
- ARROW-88 - [C++] Refactor usages of parquet_cpp namespace
- ARROW-93 - Fix builds when using XCode 7.3
- ARROW-94 - [Format] Expand list example to clarify null vs empty list
- ARROW-103 - Add files to gitignore
- ARROW-104 - [FORMAT] Add alignment and padding requirements + union clarification
- ARROW-105 - Unit tests fail if assertions are disabled
- ARROW-113 - TestValueVector test fails if cannot allocate 2GB of memory
- ARROW-185 - Make padding and alignment for all buffers be 64 bytes
- ARROW-188 - Add numpy as install requirement
- ARROW-193 - typos "int his" fix to "in this"
- ARROW-194 - C++: Allow read-only memory mapped source
- ARROW-200 - [C++/Python] Return error status on string initialization failure
- ARROW-205 - builds failing on master branch with apt-get error
- ARROW-209 - [C++] Triage builds due to unavailable LLVM apt repo
- ARROW-210 - Cleanup of the string related types in C++ code base
- ARROW-211 - [Format] Fixed typos in layout examples
- ARROW-217 - Fix Travis w.r.t conda 4.1.0 changes
- ARROW-219 - Preserve CMAKE_CXX_FLAGS, fix compiler warnings
- ARROW-223 - Do not link against libpython
- ARROW-225 - [C++/Python] master Travis CI build is broken
- ARROW-244 - Some global APIs of IPC module should be visible to the outside
- ARROW-246 - [Java] UnionVector doesn't call allocateNew() when creating it's vectorType
- ARROW-247 - Missing explicit destructor in RowBatchReader causes an incomplete type error
- ARROW-250 - Fix for ARROW-246 may cause memory leaks
- ARROW-259 - Use Flatbuffer Field type instead of MaterializedField
- ARROW-260 - Fix flaky oversized tests
- ARROW-265 - Fix few decimal bugs
- ARROW-265 - Pad negative decimal values with1
- ARROW-266 - [C++] Fix broken build due to Flatbuffers namespace change
- ARROW-274 - Add NullableMapVector to support nullable maps
- ARROW-277 - Flatbuf serialization fails for Timestamp type
- ARROW-278 - [Format] Rename Tuple to Struct_ in flatbuffers IDL
- ARROW-283 - [C++] Account for upstream changes in parquet-cpp
- ARROW-284 - Disable arrow_parquet module in Travis CI to triage builds
- ARROW-287 - Make nullable vectors use a BitVecor instead of UInt1Vector for bits
- ARROW-297 - Fix Arrow pom for release
- ARROW-304 - NullableMapReaderImpl.isSet() always returns true
- ARROW-308 - UnionListWriter.setPosition() should not call startList()
- ARROW-309 - Types.getMinorTypeForArrowType() does not work for Union type
- ARROW-313 - Build on any version of XCode
- ARROW-314 - JSONScalar is unnecessary and unused
- ARROW-320 - ComplexCopier.copy(FieldReader, FieldWriter) should not st…
- ARROW-321 - fix arrow licenses
- ARROW-855 - Arrow Memory Leak