Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
net.gonzberg:spark-sorting-helpers_2.11
Advanced tools
The spark sorting helpers is a library of convenience functions for leveraging the secondary sort functionality of Spark partitioning. Secondary sorting allows an RDD or Dataset to be partitioned by a key while sorting the values, pushing that sort into the underlying shuffle machinery. This provides an efficient way to sort values within a partition if one is already conducting a shuffle operation anyway (e.g. a join or groupBy).
This library uses the "pimp my library" pattern to add methods to RDDs or Datasets of pairs. You can import the implicits with:
import net.gonzberg.spark.sorting.implicits._
You can then call additional functions on certain RDDs or Datasets, e.g.
val rdd: RDD[(String, Int)] = ?
val groupedRDD: RDD[(String, Iterable[Int])] = rdd.sortedGroupByKey
groupedRDD.foreach((k, group) => assert group == group.sorted)
This library attempts to support Scala 2.11
, 2.12
, and 2.13
. Since there is not a single version of Spark which supports all three of those Scala versions, this library is built against different versions of Spark depending on the Scala version.
Scala | Spark |
---|---|
2.11 | 2.4.8 |
2.12 | 3.3.0 |
2.13 | 3.3.0 |
Other combinations of versions may also work, but these are the ones for which the tests run automatically. We will likely drop 2.11
support in a later release, depending on when it becomes too difficult to support.
Scaladocs are avaiable here.
This package is built using sbt
. You can run the tests with sbt test
. You can lint with sbt scalafmt
. You can use +
in front of a directive to cross-build, though you'll need Java 8 (as opposed to Java 11) to cross-build to Scala 2.11.
FAQs
spark-sorting-helpers
We found that net.gonzberg:spark-sorting-helpers_2.11 demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.