You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

PySASF

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

PySASF

A Python package for Source Apportionment with Sediment Fingerprinting.

0.0.5

PyPI

Maintainers: 1

png

PySASF

A Python package for Source Apportionment with Sediment Fingerprinting.

PySASF was developed to provide computational resources for research aimed at identifying the contributions of various sources to fluvial sediments. More specifically, PySASF implements methods for calculating the proportions contributed by each source from a dataset and its random subsamples, as well as analyzing solution variabilities. Additionally, it includes routines for visualizing confidence regions and other plots from the complete dataset and reduced samples.

This initiative originated from a collaboration between the Department of Soil Science and the Department of Mathematics at the Federal University of Santa Maria (UFSM), with participation from other educational and research institutions. The initial motivation was to reproduce the results published in Clarke and Minella (2016) and to create a package of Python routines to facilitate the replication of the experiment with other data sources.

PySASF has been used and tested first by the Interdisciplinary Research Group on Erosion and Surface Hydrology (GIPEHS at UFSM. New analysis models, resulting from research and development efforts, will be incorporated in the future, based on this academic collaboration.

Install

Download the package from here and install it using the following command line in the directory where the file was downloaded.

$ pip3  install pysasf-0.0.5.tar.gz

You can download a Python script for testing from here. You need to download the data () file from here and store it in a folder named data in the same directory as the script. Then you can run it using the terminal command:

$ python3 cm.py

This script run a command terminal version of the example of usage quick start notebook.

Alternatively you can download the full project sources here, unzip and go to notebooks directory. Open quick_star.ipynb using Jupyter Notebook or Jupyter Lab.

If you receive a No module named 'pysasf' error message, try including the following lines in the beginning of your notebook:

import sys
sys.path.append('/your_path_to/PySASF-main')

Replace your_path_to with the path to the directory where PySASF-main was extracted.

You will needs NumPy, Scipy, MatplotLib and Pandas instaled. All dependencies can be satisfied by an Anaconda installation.

Example of usage

1. Loading the data

A good starting point is to import the BasinData object class to store data from a basin's sediment sources. An instance of BasinData should be created, and the data should be loaded from a file. It is common to store data files in the 'data' directory one level above. The import and creation of an instance of BasinData are shown below.

# If you don't have PySASF instaled, you need set the directory:
import sys
sys.path.append('/home/tiagoburiol/PySASF')

from pysasf.basindata import BasinData

arvorezinha = BasinData("../data/arvorezinha_database.xlsx")

Once the file is loaded, some information and statistics can be visualized, as shown in the following examples.

arvorezinha.infos()

Sample Sizes	Fe	Mn	Cu	Zn	Ca	K	P
C	9	9	9	9	9	9	9
E	9	9	9	9	9	9	9
L	20	20	20	20	20	20	20
Y	24	24	24	24	24	24	24

arvorezinha.means()

Means	Fe	Mn	Cu	Zn	Ca	K	P
C	6.21	1470.45	18.23	79.71	165.23	3885.12	0.03
E	6.76	811.95	23.28	86.02	76.10	3182.27	0.01
L	6.63	1854.05	20.05	88.28	159.17	6572.31	0.06
Y	6.16	1119.02	30.92	99.66	276.47	9445.76	0.07

arvorezinha.std()

STD	Fe	Mn	Cu	Zn	Ca	K	P
C	0.48	548.49	2.41	7.84	82.19	1598.45	0.01
E	0.98	399.90	1.98	6.96	26.21	948.95	0.01
L	1.07	399.77	3.86	15.70	79.33	2205.99	0.01
Y	1.01	294.13	10.13	8.40	79.37	2419.21	0.02

2. Using the clarkeminela module

We can easily reproduce the Clarke and Minella (2016) method for measuring the increase in uncertainty when sampling sediment fingerprinting. A full explanation of this method is available in the paper 'Evaluating sampling efficiency when estimating sediment source contributions to suspended sediment in rivers by fingerprinting.' DOI: 10.1002/hyp.10866. The steps required to achieve the same results described in the paper can be executed with a few function calls, as shown below.

First, we need to import the clarkeminella analysis module. We will refer to it as cm.

import pysasf.clarkeminella as cm

Now we will calculate and save in a file all the possible combinations of proportions contributed by the sediment sources. The routine calculate_and_save_all_proportions() will create two files: one for all possible combinations for each sample in the database, saving their indexes, and another file for the corresponding proportions. The default method for calculation is ordinary least squares. Other methods can be chosen using arvorezinha.set_solver_option(option).

To set your output folder using arvorezinha.set_output_folder(path='/yourpath/folder')

arvorezinha.set_output_folder('../output')

Setting output folder as: ../output
Folder to save output files is: '../output'.

arvorezinha.calculate_and_save_all_proportions(load=False)

Done! Time processing: 1.893726110458374
Total combinations: 38880 , shape of proportions: (38880, 3)
Saving combinations indexes in: ../output/C9E9L20Y24_combstxt
Saving proportions calculated in: ../output/C9E9L20Y24_propstxt
Feasebles boolean array is sabed in: ../output/C9E9L20Y24_feastxt
Time for save files: 0.2960786819458008

If you want to store the proportions solutions and the combination indexes, you can choose load=True(is the defoult option) when call the rotine above. The proportions solutions and the combination indexes wil be stored on BasinDataobject class.

For read the files created and load proportions solutions and the combination indexes we can use the load_combs_and_props_from_files(combs_file, props_file) function. A example is showed below.

combs, Ps = arvorezinha.load_combs_and_props_from_files(arvorezinha.output_folder+'/C9E9L20Y24_combs.txt',
                                                        arvorezinha.output_folder+'/C9E9L20Y24_props.txt')

Loading combs and props files from: ../output

We can verify the loaded array data as follows:

display(combs, Ps)

array([[ 0,  0,  0,  0],
       [ 0,  0,  0,  1],
       [ 0,  0,  0,  2],
       ...,
       [ 8,  8, 19, 21],
       [ 8,  8, 19, 22],
       [ 8,  8, 19, 23]])



array([[ 0.445 , -0.2977,  0.8526],
       [ 0.3761,  0.128 ,  0.4959],
       [ 0.3454,  0.1248,  0.5298],
       ...,
       [ 0.4963, -0.0081,  0.5118],
       [ 0.4212, -0.6676,  1.2464],
       [-0.0679, -0.138 ,  1.206 ]])

The Clarke and Minella's criterion for considering a feasible solution is that the proportion P1 and P2 contributed by each source is less than 1 and greater than 0. We can extract the feaseble solutions usin a function cm_feasebles of clarckeminella analysis module. This is showed below.

Pfea = cm.cm_feasebles(Ps)
print("The total number of feasible solution is:", len(Pfea))

The total number of feasible solution is: 8132

A confidence region can be calculated in 2 dimentions using the $95 %$ points closest to the feaseble proportions average using Mahalanobis's distances until the mean of feaseble proportions. A more detailed explanation can be can be obtained in the Clarke and Minella's paper.

The stat module implement a function for get a confidence region, as can be seen in the example below.

from pysasf import stats

Pcr = stats.confidence_region(Pfea[:,0:2], space_dist='mahalanobis')
print("The total number of points in 95% confidence region is:", len(Pcr))

The total number of points in 95% confidence region is: 7725

Lets draw the confidence region usin the draw_hull(pts) function from plotsmodule.

from pysasf import plots
plots.draw_hull(Pcr, title = 'Confidence region')

Please, set a path to save the convex hull figure.

png

To randomly take a subset of the solutions, with a sample size of 4 for source L, for example, we can do as shown below.

from pysasf import stats

combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', 4)
print ("Suconjunto Ps de tamanho:", Ps.shape[0])

Suconjunto Ps de tamanho: 6480

To make the plot of the points and the 95% confidence region and save it to a file, we proceed as follows:

P_cr = cm.cm_feasebles(Ps)

plots.draw_hull(P_cr, savefig = True, path=arvorezinha.output_folder,
                title = 'Confidence region 95% whith Y size = 2')

Plot figure saved in: ../output/convex_hull.png

A figure will be saved in the output folder. If we want to create several plots with a sequence of reductions in the number of samples for a given source, we can proceed as follows.

for n in [2,4,8,12,16,20,24]:
    combs,Ps = stats.randon_props_subsamples(arvorezinha, 'Y', n)
    P_feas = cm.cm_feasebles(Ps)
    P_cr = stats.confidence_region(P_feas,space_dist='mahalanobis2d')
    name = 'confidence_region_Y'+str(n)
    ax = plots.draw_hull(P_cr, savefig = True, 
                         path = arvorezinha.output_folder,filename = name)
    print('Saving figure named:', name)

Plot figure saved in: ../output/confidence_region_Y2.png
Saving figure named: confidence_region_Y2
Plot figure saved in: ../output/confidence_region_Y4.png
Saving figure named: confidence_region_Y4
Plot figure saved in: ../output/confidence_region_Y8.png
Saving figure named: confidence_region_Y8
Plot figure saved in: ../output/confidence_region_Y12.png
Saving figure named: confidence_region_Y12
Plot figure saved in: ../output/confidence_region_Y16.png
Saving figure named: confidence_region_Y16
Plot figure saved in: ../output/confidence_region_Y20.png
Saving figure named: confidence_region_Y20
Plot figure saved in: ../output/confidence_region_Y24.png
Saving figure named: confidence_region_Y24

3. Processing data from reductions and repetitions

As a result of Clarke and Minella's article presents table and graphs of average values for 50 repetitions taking subsamples of different sizes drawn from each sample set. A 95% confidence regions are calculated for each sample reduction and the proportions $P_1$ and $P_2$, along with the standard deviations is calculated.

De full analysis can be repreduced and customized usin the routine run_repetitions_and_reduction (basindata, source_key, list_of_reductions,repetitions=50). The results is saved in a csvfile an can be stored and load later. A example is showed below.

cm.run_repetitions_and_reduction (arvorezinha, 'L',[2,4,8,12,16,20,])

Time for all runs: 7.855192184448242
Saving in C9E9L20Y24_L-2-4-8-12-16-20.csv

	nSamp	CV	Mean	Std	Total	Feas	MeanP1	MeanP2	MeanP3
0	2	13.6022	0.3463	0.0471	162	859	0.371663	0.278888	0.349450
1	4	7.5992	0.3814	0.0290	324	1527	0.308342	0.235412	0.456241
2	8	4.0347	0.3928	0.0158	648	2821	0.369675	0.266656	0.363668
3	12	2.3799	0.4001	0.0095	972	4713	0.334568	0.230881	0.434550
4	16	1.2213	0.4010	0.0049	1296	6539	0.337595	0.243510	0.418894
5	20	0.0000	0.4024	0.0000	1620	8132	0.339917	0.245394	0.414688

cm.run_repetitions_and_reduction (arvorezinha, 'Y',[2,4,8,12,16,20,24])

Time for all runs: 8.775497436523438
Saving in C9E9L20Y24_Y-2-4-8-12-16-20-24.csv

	nSamp	CV	Mean	Std	Total	Feas	MeanP1	MeanP2	MeanP3
0	2	15.1352	0.3603	0.0545	3240	473	0.353225	0.244306	0.402471
1	4	8.1691	0.3817	0.0312	6480	2119	0.403431	0.203006	0.393560
2	8	3.5203	0.3949	0.0139	12960	3584	0.351959	0.223128	0.424913
3	12	2.2865	0.4029	0.0092	19440	3196	0.301662	0.236558	0.461779
4	16	1.9065	0.4004	0.0076	25920	5557	0.361002	0.251664	0.387333
5	20	1.0930	0.4022	0.0044	32400	6984	0.345001	0.251578	0.403419
6	24	0.0000	0.4024	0.0000	38880	8132	0.339917	0.245394	0.414688

from pysasf import plots
files = [arvorezinha.output_folder+'/'+'C9E9L20Y24_Y-2-4-8-12-16-20-24.csv',
         arvorezinha.output_folder+'/'+'C9E9L20Y24_L-2-4-8-12-16-20.csv']

plots.plot_cm_outputs(files, 'nSamp', 'CV', savefig=False)

png

FAQs

What is PySASF?

Is PySASF well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install