Socket
Socket
Sign inDemoInstall

net.gonzberg:spark-sorting-helpers_2.11

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

net.gonzberg:spark-sorting-helpers_2.11

spark-sorting-helpers


Version published
Maintainers
1
Source

Spark Sorting Helpers

build status codecov Sonatype Nexus (Snapshots) Sonatype Nexus (Snapshots)

The spark sorting helpers is a library of convenience functions for leveraging the secondary sort functionality of Spark partitioning. Secondary sorting allows an RDD or Dataset to be partitioned by a key while sorting the values, pushing that sort into the underlying shuffle machinery. This provides an efficient way to sort values within a partition if one is already conducting a shuffle operation anyway (e.g. a join or groupBy).

Usage

This library uses the "pimp my library" pattern to add methods to RDDs or Datasets of pairs. You can import the implicits with:

import net.gonzberg.spark.sorting.implicits._

You can then call additional functions on certain RDDs or Datasets, e.g.

val rdd: RDD[(String, Int)] = ?
val groupedRDD: RDD[(String, Iterable[Int])] = rdd.sortedGroupByKey
groupedRDD.foreach((k, group) => assert group == group.sorted)

Supported Versions

This library attempts to support Scala 2.11, 2.12, and 2.13. Since there is not a single version of Spark which supports all three of those Scala versions, this library is built against different versions of Spark depending on the Scala version.

ScalaSpark
2.112.4.8
2.123.3.0
2.133.3.0

Other combinations of versions may also work, but these are the ones for which the tests run automatically. We will likely drop 2.11 support in a later release, depending on when it becomes too difficult to support.

Documentation

Scaladocs are avaiable here.

Development

This package is built using sbt. You can run the tests with sbt test. You can lint with sbt scalafmt. You can use + in front of a directive to cross-build, though you'll need Java 8 (as opposed to Java 11) to cross-build to Scala 2.11.

FAQs

Package last updated on 13 Sep 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc