Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
A Python package for Source Apportionment with Sediment Fingerprinting.
PySASF was developed to provide computational resources for research aimed at identifying the contributions of various sources to fluvial sediments. More specifically, PySASF implements methods for calculating the proportions contributed by each source from a dataset and its random subsamples, as well as analyzing solution variabilities. Additionally, it includes routines for visualizing confidence regions and other plots from the complete dataset and reduced samples.
This initiative originated from a collaboration between the Department of Soil Science and the Department of Mathematics at the Federal University of Santa Maria (UFSM), with participation from other educational and research institutions. The initial motivation was to reproduce the results published in Clarke and Minella (2016) and to create a package of Python routines to facilitate the replication of the experiment with other data sources.
PySASF has been used and tested first by the Interdisciplinary Research Group on Erosion and Surface Hydrology (GIPEHS at UFSM. New analysis models, resulting from research and development efforts, will be incorporated in the future, based on this academic collaboration.
Download the package from here and install it using the following command line in the directory where the file was downloaded.
$ pip3 install pysasf-0.0.5.tar.gz
You can download a Python script for testing from here. You need to download the data () file from here and store it in a folder named data
in the same directory as the script. Then you can run it using the terminal command:
$ python3 cm.py
This script run a command terminal version of the example of usage quick start notebook.
Alternatively you can download the full project sources here, unzip and go to notebooks directory. Open quick_star.ipynb
using Jupyter Notebook or Jupyter Lab.
If you receive a No module named 'pysasf'
error message, try including the following lines in the beginning of your notebook:
import sys
sys.path.append('/your_path_to/PySASF-main')
Replace your_path_to
with the path to the directory where PySASF-main was extracted.
You will needs NumPy, Scipy, MatplotLib and Pandas instaled. All dependencies can be satisfied by an Anaconda installation.
A good starting point is to import the BasinData
object class to store data from a basin's sediment sources. An instance of BasinData should be created, and the data should be loaded from a file. It is common to store data files in the 'data' directory one level above. The import and creation of an instance of BasinData
are shown below.
# If you don't have PySASF instaled, you need set the directory:
import sys
sys.path.append('/home/tiagoburiol/PySASF')
from pysasf.basindata import BasinData
arvorezinha = BasinData("../data/arvorezinha_database.xlsx")
Once the file is loaded, some information and statistics can be visualized, as shown in the following examples.
arvorezinha.infos()
Sample Sizes | Fe | Mn | Cu | Zn | Ca | K | P |
---|---|---|---|---|---|---|---|
C | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
E | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
L | 20 | 20 | 20 | 20 | 20 | 20 | 20 |
Y | 24 | 24 | 24 | 24 | 24 | 24 | 24 |
arvorezinha.means()
Means | Fe | Mn | Cu | Zn | Ca | K | P |
---|---|---|---|---|---|---|---|
C | 6.21 | 1470.45 | 18.23 | 79.71 | 165.23 | 3885.12 | 0.03 |
E | 6.76 | 811.95 | 23.28 | 86.02 | 76.10 | 3182.27 | 0.01 |
L | 6.63 | 1854.05 | 20.05 | 88.28 | 159.17 | 6572.31 | 0.06 |
Y | 6.16 | 1119.02 | 30.92 | 99.66 | 276.47 | 9445.76 | 0.07 |
arvorezinha.std()
STD | Fe | Mn | Cu | Zn | Ca | K | P |
---|---|---|---|---|---|---|---|
C | 0.48 | 548.49 | 2.41 | 7.84 | 82.19 | 1598.45 | 0.01 |
E | 0.98 | 399.90 | 1.98 | 6.96 | 26.21 | 948.95 | 0.01 |
L | 1.07 | 399.77 | 3.86 | 15.70 | 79.33 | 2205.99 | 0.01 |
Y | 1.01 | 294.13 | 10.13 | 8.40 | 79.37 | 2419.21 | 0.02 |
We can easily reproduce the Clarke and Minella (2016) method for measuring the increase in uncertainty when sampling sediment fingerprinting. A full explanation of this method is available in the paper 'Evaluating sampling efficiency when estimating sediment source contributions to suspended sediment in rivers by fingerprinting.' DOI: 10.1002/hyp.10866. The steps required to achieve the same results described in the paper can be executed with a few function calls, as shown below.
First, we need to import the clarkeminella
analysis module. We will refer to it as cm
.
import pysasf.clarkeminella as cm
Now we will calculate and save in a file all the possible combinations of proportions contributed by the sediment sources. The routine calculate_and_save_all_proportions()
will create two files: one for all possible combinations for each sample in the database, saving their indexes, and another file for the corresponding proportions. The default method for calculation is ordinary least squares. Other methods can be chosen using arvorezinha.set_solver_option(option)
.
To set your output folder using arvorezinha.set_output_folder(path='/yourpath/folder')
arvorezinha.set_output_folder('../output')
Setting output folder as: ../output
Folder to save output files is: '../output'.
arvorezinha.calculate_and_save_all_proportions(load=False)
Done! Time processing: 1.893726110458374
Total combinations: 38880 , shape of proportions: (38880, 3)
Saving combinations indexes in: ../output/C9E9L20Y24_combstxt
Saving proportions calculated in: ../output/C9E9L20Y24_propstxt
Feasebles boolean array is sabed in: ../output/C9E9L20Y24_feastxt
Time for save files: 0.2960786819458008
If you want to store the proportions solutions and the combination indexes, you can choose load=True
(is the defoult option) when call the rotine above. The proportions solutions and the combination indexes wil be stored on BasinData
object class.
For read the files created and load proportions solutions and the combination indexes we can use the load_combs_and_props_from_files(combs_file, props_file)
function. A example is showed below.
combs, Ps = arvorezinha.load_combs_and_props_from_files(arvorezinha.output_folder+'/C9E9L20Y24_combs.txt',
arvorezinha.output_folder+'/C9E9L20Y24_props.txt')
Loading combs and props files from: ../output
We can verify the loaded array data as follows:
display(combs, Ps)
array([[ 0, 0, 0, 0],
[ 0, 0, 0, 1],
[ 0, 0, 0, 2],
...,
[ 8, 8, 19, 21],
[ 8, 8, 19, 22],
[ 8, 8, 19, 23]])
array([[ 0.445 , -0.2977, 0.8526],
[ 0.3761, 0.128 , 0.4959],
[ 0.3454, 0.1248, 0.5298],
...,
[ 0.4963, -0.0081, 0.5118],
[ 0.4212, -0.6676, 1.2464],
[-0.0679, -0.138 , 1.206 ]])
The Clarke and Minella's criterion for considering a feasible solution is that the proportion P1 and P2 contributed by each source is less than 1 and greater than 0. We can extract the feaseble solutions usin a function cm_feasebles
of clarckeminella
analysis module. This is showed below.
Pfea = cm.cm_feasebles(Ps)
print("The total number of feasible solution is:", len(Pfea))
The total number of feasible solution is: 8132
A confidence region can be calculated in 2 dimentions using the $95 %$ points closest to the feaseble proportions average using Mahalanobis's distances until the mean of feaseble proportions. A more detailed explanation can be can be obtained in the Clarke and Minella's paper.
The stat
module implement a function for get a confidence region, as can be seen in the example below.
from pysasf import stats
Pcr = stats.confidence_region(Pfea[:,0:2], space_dist='mahalanobis')
print("The total number of points in 95% confidence region is:", len(Pcr))
The total number of points in 95% confidence region is: 7725
Lets draw the confidence region usin the draw_hull(pts)
function from plots
module.
from pysasf import plots
plots.draw_hull(Pcr, title = 'Confidence region')
Please, set a path to save the convex hull figure.
To randomly take a subset of the solutions, with a sample size of 4 for source L, for example, we can do as shown below.
from pysasf import stats
combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', 4)
print ("Suconjunto Ps de tamanho:", Ps.shape[0])
Suconjunto Ps de tamanho: 6480
To make the plot of the points and the 95% confidence region and save it to a file, we proceed as follows:
P_cr = cm.cm_feasebles(Ps)
plots.draw_hull(P_cr, savefig = True, path=arvorezinha.output_folder,
title = 'Confidence region 95% whith Y size = 2')
Plot figure saved in: ../output/convex_hull.png
A figure will be saved in the output folder. If we want to create several plots with a sequence of reductions in the number of samples for a given source, we can proceed as follows.
for n in [2,4,8,12,16,20,24]:
combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', n)
P_feas = cm.cm_feasebles(Ps)
P_cr = stats.confidence_region(P_feas,space_dist='mahalanobis2d')
name = 'confidence_region_Y'+str(n)
ax = plots.draw_hull(P_cr, savefig = True,
path = arvorezinha.output_folder,filename = name)
print('Saving figure named:', name)
Plot figure saved in: ../output/confidence_region_Y2.png
Saving figure named: confidence_region_Y2
Plot figure saved in: ../output/confidence_region_Y4.png
Saving figure named: confidence_region_Y4
Plot figure saved in: ../output/confidence_region_Y8.png
Saving figure named: confidence_region_Y8
Plot figure saved in: ../output/confidence_region_Y12.png
Saving figure named: confidence_region_Y12
Plot figure saved in: ../output/confidence_region_Y16.png
Saving figure named: confidence_region_Y16
Plot figure saved in: ../output/confidence_region_Y20.png
Saving figure named: confidence_region_Y20
Plot figure saved in: ../output/confidence_region_Y24.png
Saving figure named: confidence_region_Y24
As a result of Clarke and Minella's article presents table and graphs of average values for 50 repetitions taking subsamples of different sizes drawn from each sample set. A 95% confidence regions are calculated for each sample reduction and the proportions $P_1$ and $P_2$, along with the standard deviations is calculated.
De full analysis can be repreduced and customized usin the routine run_repetitions_and_reduction (basindata, source_key, list_of_reductions,repetitions=50)
. The results is saved in a csv
file an can be stored and load later. A example is showed below.
cm.run_repetitions_and_reduction (arvorezinha, 'L',[2,4,8,12,16,20,])
Time for all runs: 7.855192184448242
Saving in C9E9L20Y24_L-2-4-8-12-16-20.csv
nSamp | CV | Mean | Std | Total | Feas | MeanP1 | MeanP2 | MeanP3 | |
---|---|---|---|---|---|---|---|---|---|
0 | 2 | 13.6022 | 0.3463 | 0.0471 | 162 | 859 | 0.371663 | 0.278888 | 0.349450 |
1 | 4 | 7.5992 | 0.3814 | 0.0290 | 324 | 1527 | 0.308342 | 0.235412 | 0.456241 |
2 | 8 | 4.0347 | 0.3928 | 0.0158 | 648 | 2821 | 0.369675 | 0.266656 | 0.363668 |
3 | 12 | 2.3799 | 0.4001 | 0.0095 | 972 | 4713 | 0.334568 | 0.230881 | 0.434550 |
4 | 16 | 1.2213 | 0.4010 | 0.0049 | 1296 | 6539 | 0.337595 | 0.243510 | 0.418894 |
5 | 20 | 0.0000 | 0.4024 | 0.0000 | 1620 | 8132 | 0.339917 | 0.245394 | 0.414688 |
cm.run_repetitions_and_reduction (arvorezinha, 'Y',[2,4,8,12,16,20,24])
Time for all runs: 8.775497436523438
Saving in C9E9L20Y24_Y-2-4-8-12-16-20-24.csv
nSamp | CV | Mean | Std | Total | Feas | MeanP1 | MeanP2 | MeanP3 | |
---|---|---|---|---|---|---|---|---|---|
0 | 2 | 15.1352 | 0.3603 | 0.0545 | 3240 | 473 | 0.353225 | 0.244306 | 0.402471 |
1 | 4 | 8.1691 | 0.3817 | 0.0312 | 6480 | 2119 | 0.403431 | 0.203006 | 0.393560 |
2 | 8 | 3.5203 | 0.3949 | 0.0139 | 12960 | 3584 | 0.351959 | 0.223128 | 0.424913 |
3 | 12 | 2.2865 | 0.4029 | 0.0092 | 19440 | 3196 | 0.301662 | 0.236558 | 0.461779 |
4 | 16 | 1.9065 | 0.4004 | 0.0076 | 25920 | 5557 | 0.361002 | 0.251664 | 0.387333 |
5 | 20 | 1.0930 | 0.4022 | 0.0044 | 32400 | 6984 | 0.345001 | 0.251578 | 0.403419 |
6 | 24 | 0.0000 | 0.4024 | 0.0000 | 38880 | 8132 | 0.339917 | 0.245394 | 0.414688 |
from pysasf import plots
files = [arvorezinha.output_folder+'/'+'C9E9L20Y24_Y-2-4-8-12-16-20-24.csv',
arvorezinha.output_folder+'/'+'C9E9L20Y24_L-2-4-8-12-16-20.csv']
plots.plot_cm_outputs(files, 'nSamp', 'CV', savefig=False)
FAQs
A Python package for Source Apportionment with Sediment Fingerprinting.
We found that PySASF demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.