
Security News
New React Server Components Vulnerabilities: DoS and Source Code Exposure
New DoS and source code exposure bugs in React Server Components and Next.js: what’s affected and how to update safely.
shmr
Advanced tools
A set of high-order map-reduce functions
The goal of this library is to make it easier to process large data in parallel while not spending lots of time writing code. The typical workflow of this library is to split a huge file into smaller partitions and process each partition in parallel as the map-reduce framework. However, it does not require any setup as Spark or Hadoop except for one simple pip installation command. It is more suitable if you want to do something quick (and just one time).
This library, shmr, is best used in the command line with xargs or parallel. You can see the list of supported arguments by printing help:
$ python -m shmr -h
usage: sh map-reduce (1.0.18) [-h] [-v] -i INFILE [--skip_nrows SKIP_NROWS]
[-d DESER_FN] [-s SER_FN] <command>
...
optional arguments:
-h, --help show this help message and exit
-v, --verbose
-i INFILE, --infile INFILE
the path to one partition or list of partitions depend
on the sub-program
--skip_nrows SKIP_NROWS
Skip first n rows of each partition
-d DESER_FN, --deser_fn DESER_FN
Deserialization function. Default is `orjson.loads`
-s SER_FN, --ser_fn SER_FN
Serialization function. Default is `orjson.dumps`
The most important argument is the positional argument <command>, which is the operator you want to run. There are two types of operator: the one that is applied on one partition, starts with partition.*, and the other one that is applied on all partitions, starts with partitions.*. The help command above will show you all of the possible commands, which is truncated for readability. You can get details of a command using help as well. For example:
$ python -m shmr partition.map -h
usage: sh map-reduce (1.0.18) partition.map [-h] --fn FN --outfile OUTFILE
Apply a map function on every record of this partition
optional arguments:
-h, --help show this help message and exit
--fn FN (str) an import path of the map function, which should
has this signature: (record: Any) -> Any
--outfile OUTFILE (str) output file of the new partition, `*` or `{stem}`
in the path are placeholders, which will be replaced by
the stem (i.e., file name without extension) of the
current partition
There are some variables you can use in naming the output partitions:
{stem}: will be replaced by the stem of the current mapping partition{auto}: an incremental number of the new partition* or {}: a special placeholder that is either {stem} or {auto}, depends on the function you are using. If the function generates multiple partitions (e.g., group_by function), then * or {} will be replaced by {auto}, otherwise, it will be replaced by {stem}. Note that {stem} of multiple partitions will always be replaced by an empty stringFrom PyPi: pip install shmr
Below are some examples:
shmr -i <file_path> partitions.coalesce --outfile <output_files> --num_partitions=128
ls <input_files> | xargs -n 1 -I{} -P <n_threads> shmr -i {} partition.map --fn <func> --outfile <output_file>
If you provide the -v, it will show the progression bar telling you how long it will take to process one partition.
FAQs
A set of map-reduce high-order functions to use with parallel or xargs
We found that shmr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
New DoS and source code exposure bugs in React Server Components and Next.js: what’s affected and how to update safely.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.

Security News
GitHub has revoked npm classic tokens for publishing; maintainers must migrate, but OpenJS warns OIDC trusted publishing still has risky gaps for critical projects.