Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

PySASF

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

PySASF

A Python package for Source Apportionment with Sediment Fingerprinting.

  • 0.0.5
  • PyPI
  • Socket score

Maintainers
1

png

DOI

PySASF

A Python package for Source Apportionment with Sediment Fingerprinting.

PySASF was developed to provide computational resources for research aimed at identifying the contributions of various sources to fluvial sediments. More specifically, PySASF implements methods for calculating the proportions contributed by each source from a dataset and its random subsamples, as well as analyzing solution variabilities. Additionally, it includes routines for visualizing confidence regions and other plots from the complete dataset and reduced samples.

This initiative originated from a collaboration between the Department of Soil Science and the Department of Mathematics at the Federal University of Santa Maria (UFSM), with participation from other educational and research institutions. The initial motivation was to reproduce the results published in Clarke and Minella (2016) and to create a package of Python routines to facilitate the replication of the experiment with other data sources.

PySASF has been used and tested first by the Interdisciplinary Research Group on Erosion and Surface Hydrology (GIPEHS at UFSM. New analysis models, resulting from research and development efforts, will be incorporated in the future, based on this academic collaboration.

Install

Download the package from here and install it using the following command line in the directory where the file was downloaded.

$ pip3  install pysasf-0.0.5.tar.gz

You can download a Python script for testing from here. You need to download the data () file from here and store it in a folder named data in the same directory as the script. Then you can run it using the terminal command:

$ python3 cm.py

This script run a command terminal version of the example of usage quick start notebook.

Alternatively you can download the full project sources here, unzip and go to notebooks directory. Open quick_star.ipynb using Jupyter Notebook or Jupyter Lab.

If you receive a No module named 'pysasf' error message, try including the following lines in the beginning of your notebook:

import sys
sys.path.append('/your_path_to/PySASF-main')

Replace your_path_to with the path to the directory where PySASF-main was extracted.

You will needs NumPy, Scipy, MatplotLib and Pandas instaled. All dependencies can be satisfied by an Anaconda installation.

Example of usage

1. Loading the data

A good starting point is to import the BasinData object class to store data from a basin's sediment sources. An instance of BasinData should be created, and the data should be loaded from a file. It is common to store data files in the 'data' directory one level above. The import and creation of an instance of BasinData are shown below.

# If you don't have PySASF instaled, you need set the directory:
import sys
sys.path.append('/home/tiagoburiol/PySASF')
from pysasf.basindata import BasinData
arvorezinha = BasinData("../data/arvorezinha_database.xlsx")

Once the file is loaded, some information and statistics can be visualized, as shown in the following examples.

arvorezinha.infos()
Sample SizesFeMnCuZnCaKP
C9999999
E9999999
L20202020202020
Y24242424242424
arvorezinha.means()
MeansFeMnCuZnCaKP
C6.211470.4518.2379.71165.233885.120.03
E6.76811.9523.2886.0276.103182.270.01
L6.631854.0520.0588.28159.176572.310.06
Y6.161119.0230.9299.66276.479445.760.07
arvorezinha.std()
STDFeMnCuZnCaKP
C0.48548.492.417.8482.191598.450.01
E0.98399.901.986.9626.21948.950.01
L1.07399.773.8615.7079.332205.990.01
Y1.01294.1310.138.4079.372419.210.02

2. Using the clarkeminela module

We can easily reproduce the Clarke and Minella (2016) method for measuring the increase in uncertainty when sampling sediment fingerprinting. A full explanation of this method is available in the paper 'Evaluating sampling efficiency when estimating sediment source contributions to suspended sediment in rivers by fingerprinting.' DOI: 10.1002/hyp.10866. The steps required to achieve the same results described in the paper can be executed with a few function calls, as shown below.

First, we need to import the clarkeminella analysis module. We will refer to it as cm.

import pysasf.clarkeminella as cm

Now we will calculate and save in a file all the possible combinations of proportions contributed by the sediment sources. The routine calculate_and_save_all_proportions() will create two files: one for all possible combinations for each sample in the database, saving their indexes, and another file for the corresponding proportions. The default method for calculation is ordinary least squares. Other methods can be chosen using arvorezinha.set_solver_option(option).

To set your output folder using arvorezinha.set_output_folder(path='/yourpath/folder')

arvorezinha.set_output_folder('../output')
Setting output folder as: ../output
Folder to save output files is: '../output'.
arvorezinha.calculate_and_save_all_proportions(load=False)
Done! Time processing: 1.893726110458374
Total combinations: 38880 , shape of proportions: (38880, 3)
Saving combinations indexes in: ../output/C9E9L20Y24_combstxt
Saving proportions calculated in: ../output/C9E9L20Y24_propstxt
Feasebles boolean array is sabed in: ../output/C9E9L20Y24_feastxt
Time for save files: 0.2960786819458008

If you want to store the proportions solutions and the combination indexes, you can choose load=True(is the defoult option) when call the rotine above. The proportions solutions and the combination indexes wil be stored on BasinDataobject class.

For read the files created and load proportions solutions and the combination indexes we can use the load_combs_and_props_from_files(combs_file, props_file) function. A example is showed below.

combs, Ps = arvorezinha.load_combs_and_props_from_files(arvorezinha.output_folder+'/C9E9L20Y24_combs.txt',
                                                        arvorezinha.output_folder+'/C9E9L20Y24_props.txt')
Loading combs and props files from: ../output

We can verify the loaded array data as follows:

display(combs, Ps)
array([[ 0,  0,  0,  0],
       [ 0,  0,  0,  1],
       [ 0,  0,  0,  2],
       ...,
       [ 8,  8, 19, 21],
       [ 8,  8, 19, 22],
       [ 8,  8, 19, 23]])



array([[ 0.445 , -0.2977,  0.8526],
       [ 0.3761,  0.128 ,  0.4959],
       [ 0.3454,  0.1248,  0.5298],
       ...,
       [ 0.4963, -0.0081,  0.5118],
       [ 0.4212, -0.6676,  1.2464],
       [-0.0679, -0.138 ,  1.206 ]])

The Clarke and Minella's criterion for considering a feasible solution is that the proportion P1 and P2 contributed by each source is less than 1 and greater than 0. We can extract the feaseble solutions usin a function cm_feasebles of clarckeminella analysis module. This is showed below.

Pfea = cm.cm_feasebles(Ps)
print("The total number of feasible solution is:", len(Pfea))
The total number of feasible solution is: 8132

A confidence region can be calculated in 2 dimentions using the $95 %$ points closest to the feaseble proportions average using Mahalanobis's distances until the mean of feaseble proportions. A more detailed explanation can be can be obtained in the Clarke and Minella's paper.

The stat module implement a function for get a confidence region, as can be seen in the example below.

from pysasf import stats
Pcr = stats.confidence_region(Pfea[:,0:2], space_dist='mahalanobis')
print("The total number of points in 95% confidence region is:", len(Pcr))
The total number of points in 95% confidence region is: 7725

Lets draw the confidence region usin the draw_hull(pts) function from plotsmodule.

from pysasf import plots
plots.draw_hull(Pcr, title = 'Confidence region')
Please, set a path to save the convex hull figure.



png

To randomly take a subset of the solutions, with a sample size of 4 for source L, for example, we can do as shown below.

from pysasf import stats
combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', 4)
print ("Suconjunto Ps de tamanho:", Ps.shape[0])
Suconjunto Ps de tamanho: 6480

To make the plot of the points and the 95% confidence region and save it to a file, we proceed as follows:

P_cr = cm.cm_feasebles(Ps)
plots.draw_hull(P_cr, savefig = True, path=arvorezinha.output_folder,
                title = 'Confidence region 95% whith Y size = 2')
Plot figure saved in: ../output/convex_hull.png

A figure will be saved in the output folder. If we want to create several plots with a sequence of reductions in the number of samples for a given source, we can proceed as follows.

for n in [2,4,8,12,16,20,24]:
    combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', n)
    P_feas = cm.cm_feasebles(Ps)
    P_cr = stats.confidence_region(P_feas,space_dist='mahalanobis2d')
    name = 'confidence_region_Y'+str(n)
    ax = plots.draw_hull(P_cr, savefig = True, 
                         path = arvorezinha.output_folder,filename = name)
    print('Saving figure named:', name)
    
Plot figure saved in: ../output/confidence_region_Y2.png
Saving figure named: confidence_region_Y2
Plot figure saved in: ../output/confidence_region_Y4.png
Saving figure named: confidence_region_Y4
Plot figure saved in: ../output/confidence_region_Y8.png
Saving figure named: confidence_region_Y8
Plot figure saved in: ../output/confidence_region_Y12.png
Saving figure named: confidence_region_Y12
Plot figure saved in: ../output/confidence_region_Y16.png
Saving figure named: confidence_region_Y16
Plot figure saved in: ../output/confidence_region_Y20.png
Saving figure named: confidence_region_Y20
Plot figure saved in: ../output/confidence_region_Y24.png
Saving figure named: confidence_region_Y24

3. Processing data from reductions and repetitions

As a result of Clarke and Minella's article presents table and graphs of average values ​​for 50 repetitions taking subsamples of different sizes drawn from each sample set. A 95% confidence regions are calculated for each sample reduction and the proportions $P_1$ and $P_2$, along with the standard deviations is calculated.

De full analysis can be repreduced and customized usin the routine run_repetitions_and_reduction (basindata, source_key, list_of_reductions,repetitions=50). The results is saved in a csvfile an can be stored and load later. A example is showed below.

cm.run_repetitions_and_reduction (arvorezinha, 'L',[2,4,8,12,16,20,])
Time for all runs: 7.855192184448242
Saving in C9E9L20Y24_L-2-4-8-12-16-20.csv
nSampCVMeanStdTotalFeasMeanP1MeanP2MeanP3
0213.60220.34630.04711628590.3716630.2788880.349450
147.59920.38140.029032415270.3083420.2354120.456241
284.03470.39280.015864828210.3696750.2666560.363668
3122.37990.40010.009597247130.3345680.2308810.434550
4161.22130.40100.0049129665390.3375950.2435100.418894
5200.00000.40240.0000162081320.3399170.2453940.414688
cm.run_repetitions_and_reduction (arvorezinha, 'Y',[2,4,8,12,16,20,24])
Time for all runs: 8.775497436523438
Saving in C9E9L20Y24_Y-2-4-8-12-16-20-24.csv
nSampCVMeanStdTotalFeasMeanP1MeanP2MeanP3
0215.13520.36030.054532404730.3532250.2443060.402471
148.16910.38170.0312648021190.4034310.2030060.393560
283.52030.39490.01391296035840.3519590.2231280.424913
3122.28650.40290.00921944031960.3016620.2365580.461779
4161.90650.40040.00762592055570.3610020.2516640.387333
5201.09300.40220.00443240069840.3450010.2515780.403419
6240.00000.40240.00003888081320.3399170.2453940.414688
from pysasf import plots
files = [arvorezinha.output_folder+'/'+'C9E9L20Y24_Y-2-4-8-12-16-20-24.csv',
         arvorezinha.output_folder+'/'+'C9E9L20Y24_L-2-4-8-12-16-20.csv']

plots.plot_cm_outputs(files, 'nSamp', 'CV', savefig=False)

png

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc