New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

github.com/ballista-compute/ballista

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/ballista-compute/ballista

  • v0.4.1
  • Source
  • Go
  • Socket score

Version published
Created
Source

Ballista: Distributed Compute Platform

License Crates.io Discord chat

Overview

Ballista is a distributed compute platform primarily implemented in Rust, powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported as first-class citizens without paying a penalty for serialization costs.

Technologies

The foundational technologies in Ballista are:

Ballista can be deployed as a standalone cluster and also supports Kubernetes. In either case, the scheduler can be configured to use etcd as a backing store to (eventually) provide redundancy in the case of a scheduler failing.

Architecture

The following diagram highlights some of the integrations that will be possible with this unique architecture. Note that not all components shown here are available yet.

Ballista Architecture Diagram

How does this compare to Apache Spark?

Although Ballista is largely inspired by Apache Spark, there are some key differences.

  • The choice of Rust as the main execution language means that memory usage is deterministic and avoids the overhead of GC pauses.
  • Ballista is designed from the ground up to use columnar data, enabling a number of efficiencies such as vectorized processing (SIMD and GPU) and efficient compression. Although Spark does have some columnar support, it is still largely row-based today.
  • The combination of Rust and Arrow provides excellent memory efficiency and memory usage can be 5x - 10x lower than Apache Spark in some cases, which means that more processing can fit on a single node, reducing the overhead of distributed compute.
  • The use of Apache Arrow as the memory model and network protocol means that data can be exchanged between executors in any programming language with minimal serialization overhead.

Examples

The following examples should help illustrate the current capabilities of Ballista:

Project Status

To follow the progress of this project, please refer to the "This Week in Ballista" series of blog posts. Follow @BallistaCompute on Twitter to receive notifications when the blog is updated.

Releases

Ballista releases are now available on crates.io, Maven Central and Docker Hub. Please refer to the user guide for instructions on using a released version of Ballista.

Documentation

The user guide is hosted at https://ballistacompute.org, along with the blog where news and release notes are posted.

Developer documentation can be found in the docs directory.

Contributing

See CONTRIBUTING.md for information on contributing to this project.

FAQs

Package last updated on 23 Feb 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc