Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Locusts is a Python package for distributing many small jobs on a system (which can be your machine or a remote HPC running SLURM).
Locusts package is currently part of the PyPI Test archive. In order to install it, type
python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps locusts
Note: PyPI Test is not a permanent archive. Expect this installation procedure to change over time.
Locusts is thought for whom has to run a huge amount of small, independent jobs and has problems with the most used schedulers which will scatter the jobs over over too many nodes, or queue them indefinitely. Moreover, this package provides a safe, clean environment for each job instance, and keeps and collects notable inputs and outputs. In short, locusts creates a minimal filesystem where it prepares one environment for each job it has to execute. The runs are directed by a manager bash script, which schedules them and reports its stauts and the one of the jobs to the main locusts routine, which will always be run locally. Finally, it checks for a set of compulsory output files and compiles a list of success and failures.
Locusts can help you distributing your jobs when you are facing one of these three situations:
Once you give locusts the set of input to consider and the command to execute, it creates the Generic Environment, a minimal filesystem composed of three folders:
Basing on this architecture, Locusts provides two types of environments the user can choose from depending on her needs:
If the user only needs to process a (possibly huge) amount of files and get another (still huge) amount of output files in return, this environment is the optimal choice: it allows for minimal data transfer and disk space usage while each of the parallel runs will run in a protected sub-environment. The desired output files and the corresponding logs will then be collected and put in a folder designated by the user
The user could nonetheless want to parallelize a program or a code having more complex effects than taking in a bunch of input files and returning some outputs: for example, a program displacing files around a filesystem will not be able to run in the Default Locusts Environment. In these situations, the program needs to have access to a whole environment rather than to a set of input files.
Starting from this common base, there are two different environments that can be used:
You can find this example in the directory tests/test_manager/
In tests/test_manager/my_input_dir/
you will find 101 pairs of input files: inputfile\_\#.txt
and secondinputfile\_\#.txt
, where 0 <= # <= 100. Additionally, you will also find a single file named sharedfile.txt
.
The aim here is executing this small script over the 101 sets of inputs:
sleep 1; ls -lrth <inputfile> <secondinputfile> <sharedfile> > <outputfile>; cat <inputfile> <secondinputfile> <sharedfile> > <secondoutputfile>
For each pair, the script takes in inputfile\_\#.txt
, secondinputfile\_\#.txt
(both vary from instance to instance) and sharedfile.txt
(which instead remains always the same), and returns ls\_output\_\#.txt
and cat\_output\_\#.txt
. In order to mimick a longer process, the script is artificially made to last at least one second.
The file tests/test_manager/test_manager.py
gives you an example (and also a template) of how ou can submit a job on Locusts.
The function you want to call is locusts.swarm.launch
, which takes several arguments.
Before describing them, let's look at the strategy used by Locusts: in essence, you give Locusts a template of the command you want to execute, and the you tell Locusts where to look for files to execute that template with. In our case, the template is:
sleep 1; ls -lrth inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > ls_output_<id>.txt; cat inputfile_<id>.txt secondinputfile_<id>.txt <shared>sf1 > cat_output_<id>.txt
Notice there are two handles that Locusts will know how to replace: <id>
and <shared>
. The <id>
handle is there to specify the variable part of a filename (in our case, an integer in the [0,100] interval). The <shared>
tag tells locust
indir
takes the location (absolute path or relative from where you are calling the script) of the directory containing all your input filesoutdir
takes the location (absolute path or relative from where you are calling the script) of the directory where you want to collect your resultscode
takes a unique codename for the job you want to launchspcins
takes a list containing the template names for the
shdins=shared_inputs,
outs=outputs,
cmd=command_template,
parf=parameter_fileYou will find the material
FAQs
Distributes many short tasks on multicore and hpc systems
We found that locusts demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.