New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

stemdiff

Package Overview
Dependencies
Maintainers
1
Versions
23
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

stemdiff - pypi Package Compare versions

Comparing version
5.2.6
to
5.2.7
+73
src/STEMDIFF/__init__.py
'''
Package: STEMDIFF
-----------------
Conversion of a 4D-STEM dataset to a 2D-powder diffration pattern.
* Input = 4D-STEM dataset = 2D-array of 2D-NBD patterns.
- 2D-array = usually one (or more) two-dimensional scanning array(s).
- 2D-NBD = monocrystalline-like nanobeam diffraction pattern.
* Output = 2D-powder diffraction pattern
- The final 2D-diffractogram is a specific summation of 2D-NBD patterns.
- In other words, a 4D-STEM dataset is reduced to a 2D-diffraction pattern.
- The whole method (and final pattern) is called 4D-STEM/PNBD (powder NBD).
STEMDIFF modules:
* stemdiff.dbase = calculate database of 4D-stem datafiles
* stemdiff.detectors = objects describing detectors and format of the datafiles
* stemdiff.gvars = objects describing *source data* and *diffraction images*
* stemdiff.io = input/output for datafiles and corresponding arrays + images
* stemdiff.sum = summation of datafiles (standard, serial processing)
* stemdiff.summ = summation of datafiles (parallel, multicore processing)
STEMDIFF auxiliary package IDIFF and its modules:
* IDIFF contains functions for the improvement of diffraction patterns
* IDIFF is imported below so that it could be used as: sd.idiff.some_module
* IDIFF modules are:
- idiff.bcorr = background correction/subtraction
- idiff.deconv = advanced deconvolution methods
- idiff.ncore = noise correction/reduction
- idiff.psf = estimate of PSF function
'''
__version__ = "5.2.7"
# Import modules of stemdiff package
# this enables us to use modules as follows:
# >>> import stemdiff as sd
# >>> sd.io.Datafiles.read(SDATA, '000_001.dat')
import stemdiff.dbase
import stemdiff.detectors
import stemdiff.gvars
import stemdiff.io
import stemdiff.sum
import stemdiff.summ
# Import supplementary package idiff
# this enables us to use the modules as follows:
# >>> import stemdiff as sd
# >>> sd.idiff.bcorr.rolling_ball(arr)
import idiff
# Obligatory acknowledgement -- the development was co-funded by TACR.
# TACR requires that the acknowledgement is printed when we run the program.
# Nevertheless, Python packages run within other programs, not directly.
# The following code ensures that the acknowledgement is printed when:
# (1) You run this file: __init__.py
# (2) You run the package from command line: python -m stemdiff
# Technical notes:
# To get item (2) above, we define __main__.py (next to __init__.py).
# The usage of __main__.py is not very common, but still quite standard.
def acknowledgement():
print('STEMDIFF package - convert 4D-STEM data to 2D powder diffractogram.')
print('------')
print('The development of the package was co-funded by')
print('the Technology agency of the Czech Republic,')
print('program NCK, project TN02000020.')
if __name__ == '__main__':
acknowledgement()
# __main__.py = top-level code environment
# Usage: (1) __main__ in programs (2) __main__.py in packages
# More information: https://docs.python.org/3/library/__main__.html
# Here: To print the the acknowledgement when we run: python -m stemdiff
# (the acknowledgement function is defined in the sister __init__.py file
from stemdiff import acknowledgement
if __name__ == '__main__': acknowledgement()
'''
Module: stemdiff.dbase
----------------------
Functions to calculate, save and re-read the database of 4D-STEM datafiles.
* The database is saved as a pandas.DataFrame object,
which contains the following data for each datafile:
filename, XY-center coordinates, MaximumIntensity, NoOfPeaks, ShannonEntropy.
* The database enables fast filtering of datafiles
and fast access to datafile features,
which do not have to be re-calculated repeatedly.
'''
import numpy as np
import pandas as pd
import stemdiff.io
from skimage import measure
def calc_database(SDATA, DIFFIMAGES):
"""
Read 4D-STEM datafiles and calculate database of all files, which contains
[filename, XY-center coordinates, MaxIntensity, NoOfPeaks, ShannonEntropy]
for each datafile.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The object describing the diffraction images/patterns.
Returns
-------
df : pandas DataFrame object
Database contains [filename, xc, yc, MaxInt, NumPeaks, S]
for each datafile in the dataset.
"""
# Prepare variables before we run for-cycle to go through datafiles
# (initialize coordinates of the center + rescale/upscale coefficient
xc,yc = (None,None)
R = SDATA.detector.upscale
# Initialize list of datafiles
# (appending to lists is more efficient than appending to np/pd-structures
list_of_datafiles = []
# Pre-calculate additional variables
# (it would be wasteful to calculate them in each for-cycle loop
neighborhood_matrix = np.ones((DIFFIMAGES.peak_dist,DIFFIMAGES.peak_dist))
# Go through datafiles and calculate their entropy
for datafile in SDATA.filenames:
# a) Read datafile
arr = stemdiff.io.Datafiles.read(SDATA,datafile)
# b) Get datafile name
datafile_name = datafile.relative_to(SDATA.data_dir)
# c) Calculate intensity center
# (to get high precision
# (the center must be determined for rescaled/upscaled array
if DIFFIMAGES.ctype == 2:
xc,yc = stemdiff.io.Arrays.find_center(
stemdiff.io.Arrays.rescale(arr, R, order=3), # rescaled array
DIFFIMAGES.csquare*R, DIFFIMAGES.cintensity) # central region
elif (DIFFIMAGES.ctype == 1) and (xc == None):
xc,yc = stemdiff.io.Array.find_center(
stemdiff.io.Arrays.rescale(arr, R, order=3), # rescaled array
DIFFIMAGES.csquare*R, DIFFIMAGES.cintensity) # central region
elif (DIFFIMAGES.ctype == 0) and (xc == None):
geometric_center = round(SDATA.detector.detector_size*R/2)
xc,yc = (geometric_center,geometric_center)
# d) Determine maximum intensity
# NOTE: max_intensity must be saved as float
# Reason: when we later convert to pandas.Dataframe
# ...the datatypes are inferred from the first value in each column
# ...which may not be correct if the first integer is a small number
# ...as the estimated datatype may be int16
# ...and this may not work for larger values in the next datafiles
max_intensity = float(np.max(arr))
# e) Estimate number of peaks (local maxima)
no_of_maxima = stemdiff.io.Arrays.number_of_peaks(arr,
peak_height = DIFFIMAGES.peak_height,
neighborhood_matrix=neighborhood_matrix)
# f) Calculate Shannon entropy of the datafile
entropy = measure.shannon_entropy(arr)
entropy = round(entropy,2)
# d) Append all calculated values to list
list_of_datafiles.append(
[datafile_name, xc, yc, max_intensity, no_of_maxima, entropy])
# Convert list to pandas DataFrame
df = pd.DataFrame(
list_of_datafiles,
columns=['DatafileName','Xcenter','Ycenter', 'MaxInt', 'Peaks','S'])
# Return the dataframe containing names of datafiles + their entropies
return(df)
def save_database(df, output_file):
"""
Save the database of 4D-STEM datafiles;
the dbase is a pandas.DataFrame object saved as pickled object/zip-file.
Parameters
----------
df : pandas.DataFrame object
This object is a database of all 4D-STEM datafiles,
which contains basic characteristics (filename, XY-center, entropy ...)
of each datafile in 4D-STEM dataset.
output_file : str
Filename of the output file (without extension).
The database is saved as a pickled object/zip-file.
Returns
-------
None
The output is the database *df* saved as *output_file* on a disk.
"""
df.to_pickle(output_file)
def read_database(input_database):
"""
Read the calculated database of 4D-STEM datafiles.
Parameters
----------
input_database : str or pandas.DataFrame
* str = filename of the input file that contains the database.
* pandas.Dataframe = dataframe that contains the database.
* Why two possile types of input?
If the database is in memory in the form of pandas.DataFrame
(which is quite common when using STEMDIFF package),
it is useles to re-read it from file.
Returns
-------
df : pandas.DataFrame object
Database that has been read from disk or pandas.Dataframe.
"""
# Read database from [input_file]
# NOTE: input_database is either saved/pickled file or pd.DataFrame
# Reason: sometimes it is more efficient to use pd.DataFrame directly
if type(input_database) == pd.DataFrame:
df = input_database
else:
df = pd.read_pickle(input_database)
return(df)
'''
Module: stemdiff.detectors
--------------------------
Description of detectors that can be used in stemdiff package.
This module is basically a container of classes.
Each class describes a 2D STEM detector, from which we can read datafiles.
The description of the detector not difficult.
All we need to define is the detector name,
a few technical parameters described below,
and how to read/save datafiles in given detector format.
All detector parameters are described below in TimePix detector class.
Therefore, the new classes = new detectors can be added quite easily:
* Copy the class describing TimePix detector.
* Rename the class as needed, for example: My_new_STEM_detector.
* Re-define all properties and methods of the new class as necessary.
* When you are done, the new detector can be used within STEMDIFF package.
'''
import sys
import inspect
import numpy as np
from PIL import Image
def list_of_known_detectors():
'''
Get a list of known detectors = classes defined in stemdiff.detectors.
Returns
-------
detectors : list
List of known detectors = classes defined in stemdiff.detectors module.
'''
# Prepare list of known detectors
detectors = []
# Get names of all classes in current module
# Based on stackoveflow: https://stackoverflow.com/q/1796180
for name, obj in inspect.getmembers(sys.modules[__name__]):
if inspect.isclass(obj):
detectors.append(obj)
# Return list of known detectors
return(detectors)
def print_known_detectors():
'''
Print a list of known detectors = classes defined in stemdiff.detectors.
Returns
-------
Nothing
The list of detectors is just printed on the screen.
'''
detectors = list_of_known_detectors()
print('List of knonw detectors = classes defined in stemdiff.detectors:')
for detector in detectors:
print(detector)
def describe_detector(detector_object):
'''
Global method: the description of the detector on the screen.
Parameters
----------
detector_object : object of any class in stemdiff.detectors
The initialized detector object.
The object contains description of the detector.
It is created by calling init method of any stemdiff.detectors class.
Returns
-------
None
The output is the text on the screen.
Example
-------
>>> # Short example
>>> # (minimalistic example
>>> import stemdiff as sd
>>> my_detector = sd.detectors.TimePix()
>>> my_detector.self_describe() # OO-interface, indirect call
>>> describe_detector(my_detector) # procedural interface, direct call
>>>
>>> # Real usage
>>> # (in STEMDIFF scripts,
>>> # (the detector is usually defined
>>> # (within stemdiff.gvars.SourceData object
>>> import stemdiff as sd
>>> SDATA = sd.gvars.SourceData(
>>> detector = sd.detectors.TimePix(),
>>> data_dir = r'./DATA',
>>> filenames = r'*.dat')
>>> SDATA.detector.self_describe()
'''
print('Detector name :', detector_object.detector_name)
print('Detector size :', detector_object.detector_size)
print('Maximum intensity :', detector_object.max_intensity)
print('Intensity data type :', detector_object.data_type.__name__)
print('Upscale parameter :', detector_object.upscale)
print('-----')
print('* Detector size = size of the (square) detector (in pixels).')
print('* Maximum intensity = max.intensity the detector can measure.')
print('* Intensity data type = the format of the saved intensities.')
print('* Upscale parameter: each datafile/image is upscaled by it.')
class TimePix:
'''
Definition of TimePix detector.
Parameters
----------
detector_name : str, default is 'TimePix'
Name of the detector.
Keep the default unless you have specific reasons.
detector_size : integer, default is 256
Size of the detector in pixels.
Keep the default unless you have specific reasons.
max_intensity : int, default is 11810
Maximum intensity of TimePix detector.
Keep the default unless you have specific reasons.
data_type : numpy data type, optional, default is np.uint16
Type of data, which are saved in the binary file.
TimePix detector saves the data as 16-bit integers.
This corresponds to np.uint16 (more info in NumPy documentation).
upscale : integer, default is 4
Upscaling coefficient.
Final image size = detector_size * upscale.
The upscaling coefficient increases the detector resolution.
Surprisingly enough, the upscaling helps to improve final resolution.
Returns
-------
TimePix detector object
Format of TimePix datafiles
---------------------------
* binary data files, usually with DAT extension
* a 1D arrays of 16-bit intensities = np.uint16 values
'''
def __init__(self, detector_name='TimePix',
detector_size=256, max_intensity=11810,
data_type=np.uint16, upscale=4):
# The initialization of TimePix detector objects.
# The parameters are described above in class definition.
# -----
self.detector_name = detector_name
self.detector_size = detector_size
self.max_intensity = max_intensity
self.data_type = data_type
self.upscale = upscale
def self_describe(self):
'''
Print a simple textual description of the detector on the screen.
Returns
-------
None
The description of the detector is just printed on the screen.
Technical note
--------------
* This is just a wrapper around global function named
stemdiff.detectors.describe_detector.
* Reason: this global function is common to all detector types.
* This simple solution is used instead of (needlessly complex)
inheritance in this case.
'''
describe_detector(self)
def read_datafile(self, filename, arr_size=None):
'''
Read datafile in TimePix detector format.
Parameters
----------
filename : str or path
Name of the datafile to read.
arr_size : int, optional, default is None
Size of the square array to read.
Typically, we read original datafiles,
whose size = detector.size.
Nonetheless, the datafiles might have been saved
with a smaller size = arr_size.
Returns
-------
arr : 2D-numpy array
2D-array containing image from TimePix detector.
Each element of the array = the intensity detected at given pixel.
'''
# Read binary datafile (to 1D-array)
arr = np.fromfile(filename, dtype=self.data_type)
# Determine edge of the file - we work only with square files
edge = int(np.sqrt(arr.size))
# Reshape the array and return
arr = arr.reshape(edge, edge)
return(arr)
def save_datafile(self, arr, filename):
'''
Save 2D-array as a datafile in the TimePix detector format.
Parameters
----------
arr : numpy array
The array to save in the datafile with [filename].
filename : str or path-like object
The filename of the saved array.
Returns
-------
None
The result is the file named *filename*,
containing the *arr* in stemdiff.detectors.TimePix format.
'''
# Slightly modified according to
# https://stackoverflow.com/q/43211616
fh = open(filename,'wb')
arr = arr.flatten()
BlockArray = np.array(arr).astype(np.uint16)
BlockArray.tofile(fh)
fh.close()
class Secom:
'''
Definition of Secom detector.
Parameters
----------
detector_name : str, default is 'Secom'
Name of the detector.
Keep the default unless you have specific reasons.
detector_size : integer, default is 2048
Size of the detector in pixels.
Keep the default unless you have specific reasons.
data_type : numpy data type, optional, default is np.uint16
Type of data, which are saved in the Secom TIFF-files.
Secom detector saves the data as 16-bit TIFF files.
This corresponds to np.uint16 (more info in NumPy documentation).
upscale : integer, default is 1
Upscaling coefficient.
Final image size = detector_size * upscale.
The upscaling coefficient increases the detector resolution.
Surprisingly enough, the upscaling helps to improve final resolution.
Returns
-------
Secom detector object.
Format of SECOM datafiles
-------------------------
* image files, TIFF format
* 16-bit images = images containing np.uint16 values
'''
def __init__(self, detector_name='Secom',
detector_size=2048, max_intensity=65536,
data_type=np.uint16, upscale=1):
# The initialization of Secom detector objects.
# The parameters are described above in class definition.
# -----
self.detector_name = detector_name
self.detector_size = detector_size
self.max_intensity = max_intensity
self.data_type = data_type
self.upscale = upscale
def self_describe(self):
'''
Print a simple textual description of the detector on the screen.
Returns
-------
None
The description of the detector is just printed on the screen.
Technical note
--------------
* This is just a wrapper around global function named
stemdiff.detectors.describe_detector.
* Reason: this global function is common to all detector types.
* This simple solution is used instead of (needlessly complex)
inheritance in this case.
'''
describe_detector(self)
def read_datafile(self, filename):
'''
Read datafile in Secom detector format.
Parameters
----------
filename : str or path
Name of the datafile to read.
Returns
-------
arr : 2D-numpy array
2D-array containing image from Secom detector.
Each element of the array = the intensity detected at given pixel.
'''
arr = np.array(Image.open(filename))
return(arr)
def save_datafile(self, arr, filename):
'''
Save 2D-array as a datafile in the Secom detector format.
Parameters
----------
arr : numpy array
The array to save in the datafile with [filename].
filename : str or path-like object
The filename of the saved array.
Returns
-------
None
The result is the file named *filename*,
containing the *arr* in stemdiff.detectors.Secom format.
'''
im = Image.fromarray(arr.astype(np.uint16))
im.save(filename)
class Arina:
'''
* TODO: Radim
* The same like for Secom, using method copy+paste+modify :-)
'''
pass
'''
Module: stemdiff.gvars
----------------------
Global variables/objects for package stemdiff.
The global variables are used throughout the whole program.
In order to be easily accessible, they are defined as two objects:
* SourceData = an object defining the input datafiles.
* DiffImages = an object desribing basic/common features of all NBD patterns.
The usage of stemdiff.gvars module is quite simple.
In 99% cases, we just add the following code at beginning of STEMDIFF script
(the arguments have to be adjusted to given experiment, of course):
>>> # Define source data
>>> # (datafiles from TimePix detector
>>> # (saved in data directory: D:/DATA.SH/STEMDIFF/SAMPLE_8
>>> # (datafiles are in subdirs: 01,02,... and named as: *.dat
>>> SDATA = stemdiff.gvars.SourceData(
>>> detector = stemdiff.detectors.TimePix(),
>>> data_dir = r'D:/DATA.SH/STEMDIFF/SAMPLE_8',
>>> filenames = r'??/*.dat')
>>>
>>> # Set parameters of diffraction images
>>> # (we consider only central region with imgsize=100
>>> # (size of the region for PSF estimate will be: psfsize=30
>>> # (values of other/all args => documentation of stemdiff.gvars.DiffImages
>>> DIFFIMAGES = stemdiff.gvars.DiffImages(
>>> imgsize=100, psfsize=30,
>>> ctype=2, csquare=20, cintensity=0.8,
>>> peak_height=100, peak_dist=9)
'''
from pathlib import Path
class SourceData:
'''
Define the input datafiles.
Parameters
----------
data_dir : str (which will be passed to Path method)
This parameter is passed to Path method from pathlib library.
It is strongly recommeded to use r-strings.
Example: `data_dir = r'd:/data.sh/stemdiff/tio2'`.
files : str (which will be passed to data_dir.glob method)
This parameter is passed to Path.glob method from pathlib library.
It is strongly recommeded to use r-strings.
Example1: `datafiles = r'*.dat'` = all `*.dat` files in parent_dir;
Example2: `datafiles = r'??/*.dat'` = all `*.dat` in subdirs `01,02...`
Returns
-------
DataFiles object
'''
def __init__(self, detector, data_dir, filenames):
# The initialization of SourceData objects.
# The parameters are described above in class definition.
# -----
self.detector = detector
self.data_dir = Path(data_dir)
self.filenames = self.data_dir.glob(filenames)
class DiffImages:
'''
Set parameters/characteristics of experimental diffraction images.
Parameters
----------
imgsize : integer, smaller than detector_size
Size of array read from the detector is reduced to imgsize.
If given, we sum only the central square with size = imgsize.
Smaller area = higher speed;
outer area = just weak diffractions.
psfize : integer, smaller than detector_size
Size/edge of central square, from which 2D-PSF is determined.
ctype : integer, values = 0,1,2
0 = intensity center not determined, geometrical center is used;
1 = center determined from the first image and kept constant;
2 = center is determined for each individual image.
csquare : integer, interval = 10--DET_SIZE
Size of the central square (in pixels),
within which the center of intensity is searched for.
cintensity : float, interval = 0--1
Intensity fraction, which is used for center determination.
Example: cintensity=0.9 => consider only pixels > 0.9 * max.intensity.
peak_height : float, optional, default is 100
Search for peaks whose intensity > peak_height.
peak_dist : integer, optional, default is 3
Minimal distance between possible neighboring peaks.
Returns
-------
DiffImages object
'''
def __init__(self,
imgsize=100, psfsize=30,
ctype=2, csquare=20, cintensity=0.8,
peak_height=100, peak_dist=3):
# The initialization of DiffImages objects.
# The parameters are described above in class definition.
# -----
self.imgsize = imgsize
self.psfsize = psfsize
self.ctype = ctype
self.csquare = csquare
self.cintensity = cintensity
self.peak_height = peak_height
self.peak_dist = peak_dist
'''
Module: stemdiff.io
-------------------
Input/output functions for package stemdiff.
Three types of stemdiff.io objects
* Datafiles = files on disk, saved directly from a 2D-STEM detector.
* Arrays = the datafiles converted to numpy array objects.
* Images = the datafiles converted to PNG files.
Additional stemdiff.io utilities
* Plots = easy multiplots from selected datafiles/arrays/images.
* set_plot_parameters = a general function available without subclassing.
General strategy for working with stemdiff.io objects
* Datafiles and Images are usually
not used directly, but just converted to np.array objects.
* All data manipulation (showing, scaling, saving ...)
is done within np.array objects.
* Datafiles and Images have (intentionally) just a limited amount of methods,
the most important of which is read - this method simply reads
Datafile/Image to a np.array.
Examples how to use Datafiles, Arrays and Images
>>> # Show a datafile
>>> # (basic operation => there is Datafiles.function for it
>>> stemdiff.io.Datafiles.show(SDATA, filename)
>>> # Read a datafile to array
>>> # (basic operation => there is Datafiles.function for it
>>> arr = stemdiff.io.Datafiles.read(SDATA, filename)
>>> # Describe AND show the datafile
>>> # (more complex operation:
>>> # (1) read datafile to array - using Datafiles.read
>>> # (2) do what you need (here: describe, show) - using Arrays.functions
>>> arr = stemdiff.io.Datafiles.read(SDATA, datafile)
>>> stemdiff.io.Arrays.describe(arr, csquare=20)
>>> stemdiff.io.Arrays.show(arr, icut=1000, cmap='gray')
'''
import os, sys
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from skimage import transform, measure, morphology
def set_plot_parameters(size=(12,9), dpi=75, fontsize=10, my_rcParams=None):
'''
Set global plot parameters (mostly for plotting in Jupyter).
Parameters
----------
size : tuple of two floats, optional, the default is (12,9)
Size of the figure (width, height) in [cm].
dpi : int, optional, the defalut is 75
DPI of the figure.
fontsize : int, optional, the default is 10
Size of the font used in figure labels etc.
my_rcParams : dict, optional, default is None
Dictionary in plt.rcParams format
containing any other allowed global plot parameters.
Returns
-------
None
The result is a modification of the global plt.rcParams variable.
'''
# This function just calls the final function in Plotting module.
# (left in the main namespace of the package as it is frequently used
Plots.set_plot_parameters(size, dpi, fontsize, my_rcParams)
class Datafiles:
'''
Datafiles class = a collection of functions
that work with datafiles from 2D-STEM detector
(assumption: the datafiles were saved as standard files on a disk).
'''
def read(SDATA, filename):
'''
Read a datafile from STEM detector to an array.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
filename : string or pathlib object
Name of datafile to be read into numpy 2D array.
Filename can be given as an absolute path or
a path relative to SDATA.data_dir
(i.e. data directory path, which is saved in SDATA object).
Returns
-------
arr : 2D numpy array
The array converted from datafile with given *filename*.
'''
# The reading of the datafile depends on the filename argument
# (it can be an absolute path or a path relative to SDATA.data_dir
if os.path.isfile(filename):
# Filename given as absolute path:
# Everything Ok => just read and return
arr = SDATA.detector.read_datafile(filename)
return(arr)
else:
# Filename not given as relative path
# Perhaps it is a relative path - test if it exits
complete_filename = SDATA.data_dir.joinpath(filename)
if os.path.isfile(complete_filename):
# Filename was given as relative path and it exists
# Everything Ok => read and return the completed filename
arr = SDATA.detector.read_datafile(complete_filename)
return(arr)
else:
# Filename not found
# (it was not a correct absolute or relative path
print(f'Datafile [{filename}] not found!')
sys.exit()
def show(SDATA, filename,
icut=None, itype='8bit', R=None, cmap='gray',
center=False, csquare=20, cintensity=0.8):
'''
Show datafile/diffractogram with basic characteristics.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
filename : str or Path
Name of datafile to be shown.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is None
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma' ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
Returns
-------
Nothing
The output is the datafile shown as an image on the screen.
Technical note
--------------
* This function just combines Datafiles.read + Arrays.show functions.
'''
# Read datafile to array
arr = Datafiles.read(SDATA, filename)
# Describe datafile/array
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
def show_from_disk(SDATA,
interactive=True, max_files=None,
icut=1000, itype=None, R=None, cmap='gray',
center=True, csquare=20, cintensity=0.8,
peak_height=100, peak_distance=9):
'''
Show datafiles (stored in a disk) from 2D-STEM detector.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
interactive: bool, optional, the defailt is True
If True, images are shown interactively,
i.e. any key = show next image, 'q' = quit.
max_files: integer, optional, the default is None
If not(interactive==True) and max_files > 0,
show files non-interactively = in one run, until
number of files is less than max_files limit.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, None or '8bit' or '16bit'
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
peak_height : int, optional, default is 100
Minimal height of the peak to be detected.
peak_distance : int, optional, default is 5
Minimal distance between two peaks so that they were separated.
Returns
-------
Nothing
The output are the files and their characteristics on the screen.
Technical note
--------------
* This function uses Datafiles.read and than Arrays.functions.
'''
# Initialization
file_counter = 0
# Iterate through the files
for datafile in SDATA.filenames:
# Read datafile from disk to array
arr = Datafiles.read(SDATA, datafile)
# Print datafile name
datafile_name = datafile.relative_to(SDATA.data_dir)
print('Datafile:', datafile_name)
# Describe the datafile/array
Arrays.describe(arr,
csquare, cintensity, peak_height, peak_distance)
# Show the datafile/array
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
# Decide if we should stop the show
if interactive:
# Wait for keyboard input...
choice = str(input('[Enter] to show next, [q] to quit...\n'))
# Break if 'q' was pressed and continue otherwise...
if choice == 'q': break
elif max_files:
file_counter += 1
if file_counter >= max_files: break
def show_from_database(SDATA, df,
interactive=True, max_files=None,
icut=1000, itype='8bit', cmap='gray'):
'''
Show datafiles (pre-selected in a database) from 2D-STEM detector.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be shown.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
interactive: bool, optional, the defailt is True
If True, images are shown interactively,
i.e. any key = show next image, 'q' = quit.
max_files: integer, optional, the default is None
If not(interactive==True) and max_files > 0,
show files non-interactively = in one run, until
number of files is less than max_files limit.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is '8bit'
Type of the image - 8 or 16 bit grayscale.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
Returns
-------
Nothing
The output are the files and their characteristics on the screen.
Technical notes
---------------
* This function uses Datafiles.read function
to read data from database.
* As the function uses database data,
it cannot use standard Arrays functions.
'''
# Initialize file counter
file_counter = 0
# Show the files and their characteristics saved in the database
for index,datafile in df.iterrows():
# Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = Datafiles.read(SDATA, datafile_name)
# Print datafile characteristics from the database
print(f'Datafile: {datafile.DatafileName}')
print(f'Center (x,y): \
({datafile.Xcenter:.1f},{datafile.Ycenter:.1f})')
print(f'Maximum intensity: {datafile.MaxInt}')
print(f'Number of peaks: {datafile.Peaks}')
print(f'Shannon entropy: {datafile.S}')
# Show datafile (and draw XY-center from the database data)
arr = np.where(arr>icut, icut, arr)
plt.imshow(arr, cmap=cmap)
# Draw center
# (we read data from database
# (files are shown without any rescaling
# (but database contains Xcenter,Ycenter from upscaled images
# (=> we have to divide Xcenter,Ycenter by rescale coefficient!
R = SDATA.detector.upscale
plt.plot(
datafile.Ycenter/R,
datafile.Xcenter/R,
'r+', markersize=20)
plt.show()
# Increase file counter & stop if max_files limit was reached
file_counter += 1
if file_counter >= max_files: break
class Arrays:
'''
Arrays class = a collection of functions
that work with 2D-arrays, which represent datafiles from 2D-STEM detector.
'''
def show(arr,
icut=None, itype=None, R=None, cmap=None,
center=False, csquare=20, cintensity=0.8,
plt_type='2D', plt_size=None, colorbar=False):
'''
Show 2D-array as an image.
Parameters
----------
arr : 2D numpy array
Array to show.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, None or '8bit' or '16bit'
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is None
Matplotlib colormap for plotting of the array.
Interesting or high-contrast options:
'gray', 'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
plt_type : str, '2D' or '3D', optional, default is '2D'
Type of the plot: 2D-dimensional or 3D-dimensional/surface plot.
plt_size : int, optional, default is not
If given, we plot only the central region with size = *plt_size*.
For central region we use a geometric center - see Technical notes.
colorbar : bool, optional, the default is False
If True, a colorbar is added to the plot.
Returns
-------
Nothing
The output is the array shown as an image on the screen.
Technical notes
---------------
* In this function, we *do not* center the image/array.
Center can be drawn to 2D-image, but array *is not centered*.
* Edges can be removed (using plt_size argument),
but only only with respect to the geometrical center,
which means that the function shows a *non-cenered central region*.
* If you need to show *centered central region* of an array,
combine Arrays.find_center + Arrays.remove_edges + Arrays.show
'''
# Prepare array for saving
arr = Arrays.prepare_for_show_or_save(arr, icut, itype, R)
# Remove edges of the plot, if requested
# (just simple removal of edges based on geometrical center!
# (reason: simplicity; for centering/edge removal we have other funcs
if plt_size:
Xsize,Ysize = arr.shape
xc,yc = (int(Xsize/2),int(Ysize/2))
if plt_size:
arr = Arrays.remove_edges(arr,plt_size,xc,yc)
# Plot array as image
# (a) Prepare 2D plot (default)
if plt_type=='2D':
if cmap==None: # if cmap not selected, set default for 2D maps
cmap='viridis'
plt.imshow(arr, cmap=cmap)
if colorbar: # Add colorbar
plt.colorbar()
if center==True: # Mark intensity center in the plot
xc,yc = Arrays.find_center(arr,csquare, cintensity)
plt.plot(yc,xc, 'r+', markersize=20) # switch xc,yc for img!
# (b) Prepare 3D plot (option; if plt_type is not the default '2D')
else:
if cmap==None: # if cmap not selected, set default for 3D maps
cmap='coolwarm'
# Prepare meshgrid for 3D-plotting
Xsize,Ysize = arr.shape
X = np.arange(Xsize)
Y = np.arange(Ysize)
Xm,Ym = np.meshgrid(X,Y)
# Create 3D-plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(
Xm,Ym,arr, cmap=cmap, linewidth=0, antialiased=False)
plt.tight_layout()
# (c) Show the plot
plt.show()
def describe(arr,
csquare=20, cintensity=0.8,
peak_height=100, peak_distance=5):
'''
Describe 2D-array = print XY-center, MaxIntensity, Peaks, Sh-entropy.
Parameters
----------
arr : 2D numpy array
Array to describe.
csquare : integer, optional, default is None
Edge of a central square, from which the center will be determined.
cintensity : float in interval 0--1, optional, default is None
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
peak_height : int, optional, default is 100
Minimal height of the peak to be detected.
peak_distance : int, optional, default is 5
Minimal distance between two peaks so that they were separated.
Returns
-------
Nothing
The array characteristics are just printed.
Technical note
--------------
This function is just a wrapper
around several np.functions and Arrays.functions.
To get the values, use the individual functions instead.
'''
# Determine center (of intensity)
x,y = Arrays.find_center(arr, csquare, cintensity)
print(f'Center (x,y): ({x:.1f},{y:.1f})')
# Determine maximum intensity
max_intensity = np.max(arr)
print(f'Maximum intensity = {max_intensity:d}')
# Estimate number of peaks (local maxima)
no_of_maxima = Arrays.number_of_peaks(arr, peak_height, peak_distance)
print(f'Number of peaks = {no_of_maxima}')
# Calculate Shannon entropy of the datafile
entropy_value = measure.shannon_entropy(arr)
print(f'Shannon entropy = {entropy_value:.2f}')
def find_center(arr, csquare=None, cintensity=None):
'''
Determine center of mass for 2D numpy array.
Array center = mass/intensity center ~ position of central spot.
Warning: In most cases, geometric center is NOT mass/intensity center.
Parameters
----------
arr : numpy 2D array
Numpy 2D array, whose center (of mass ~ intensity) we want to get.
csquare : integer, optional, default is None
Edge of a central square, from which the center will be determined.
cintensity : float in interval 0--1, optional, default is None
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Returns
-------
xc,yc : integers
Coordinates of the intesity center = position of the primary beam.
'''
# Calculate center of array
if csquare:
# If csquare was given,
# calculate center only for the square in the center,
# in which we set background intensity = 0 to get correct results.
# a) Calculate array corresponding to central square
xsize,ysize = arr.shape
xborder = (xsize - csquare) // 2
yborder = (ysize - csquare) // 2
arr2 = arr[xborder:-xborder,yborder:-yborder].copy()
# b) Set intensity lower than maximum*coeff to 0 (background removal)
coeff = cintensity or 0.8
arr2 = np.where(arr2>np.max(arr2)*coeff, arr2, 0)
# c) Calculate center of intensity (and add borders at the end)
M = measure.moments(arr2,1)
(xc,yc) = (M[1,0]/M[0,0], M[0,1]/M[0,0])
(xc,yc) = (xc+xborder,yc+yborder)
(xc,yc) = np.round([xc,yc],2)
else:
# If csquare was not given,
# calculate center for the whole array.
# => Wrong position of central spot for non-centrosymmetric images!
M = measure.moments(arr,1)
(xc,yc) = (M[1,0]/M[0,0], M[0,1]/M[0,0])
(xc,yc) = np.round([xc,yc],2)
# Return final values
# IMPORTANT: the values works for arrays => switch xc,yc for images!
return(xc,yc)
def number_of_peaks(arr, peak_height,
peak_distance=None, neighborhood_matrix=None):
'''
Estimate number of peaks in given array.
Parameters
----------
arr : 2D numpy array
Array, for which we want to determine the number of peaks.
peak_height : float
Height of peak with respect to background; obligatory argument.
peak_distance : int, optional, default is None
Distance between two neighboring peaks.
If given, distance matrix is calculated out of it.
neighborhood_matrix : numpy 2D-array, optional, default is None
The neighborhood expressed as a 2D-array of 1's and 0's.
The neighborhood matrix can be eigher given directly
(using argument neighborhood_matrix) or indirectly
(calculated from argument peak_distance).
Returns
-------
no_of_peaks : int
Estimated number of peaks (local maxima) in given array.
'''
# If peak distance was given, calculate distance matrix from it.
if peak_distance:
neighborhood_matrix = np.ones(
(peak_distance,peak_distance), dtype=np.uint8)
# Determine number of peaks
no_of_peaks = np.sum(morphology.h_maxima(
arr, h=peak_height, footprint=neighborhood_matrix))
# Return number of peaks
return(no_of_peaks)
def rescale(arr, R, order=None):
'''
Rescale 2D numpy array (which represents an image).
Parameters
----------
arr : 2D numpy array
Numpy array representing DAT-file/image.
R : integer
Rescale parameter: new_size_of the array = original_size * R
Returns
-------
arr : 2D numpy array
The array has `new_size = original_size * R`.
'''
# Keep original value of array maximum.
arr_max = np.max(arr)
# Rescale the array.
arr = transform.rescale(arr, R, order)
# Restore the original value of array maximum.
arr = arr/np.max(arr) * arr_max
# Return the rescaled array.
return(arr)
def remove_edges(arr,rsize,xc,yc):
'''
Cut array to rsize by removing edges; center of new array = (xc,yc).
Parameters
----------
arr : numpy 2D array
The original array, whose size should be reduced.
rsize : integer
The size of reduced array.
xc,yc : integers
The center of original array;
the reduced array is cut to rsize, center of new array is in xc,yc.
Returns
-------
arr : 2D numpy array
The array with reduced size.
'''
halfsize = int(rsize/2)
if (rsize % 2) == 0:
arr = arr[xc-halfsize:xc+halfsize, yc-halfsize:yc+halfsize]
else:
arr = arr[xc-halfsize:xc+halfsize+1, yc-halfsize:yc+halfsize+1]
return(arr)
def save_as_image(arr, output_image, icut=None, itype='8bit', R=None):
'''
Save 2D numpy array as grayscale image.
Parameters
----------
arr : 2D numpy array
Array or image object to save.
output_image : string or pathlib object
Name of the output/saved file.
icut : integer
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype: string ('8bit' or '16bit')
Type of the image: 8 or 16 bit grayscale.
R: integer
Rescale coefficient;
the input array is rescaled/enlarged R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be saved with R = 2 (or 4)
in order to get sufficiently large image for further processing.
Returns
-------
Nothing
The output is *arr* saved as *output_image* on a disk.
'''
# Prepare array for saving
arr = Arrays.prepare_for_show_or_save(arr, icut, itype, R)
# Prepare image object (8bit or 16bit)
if itype == '8bit':
img = Image.fromarray(arr, 'L')
else:
img = Image.fromarray(arr)
# Save image
img.save(output_image)
def save_as_datafile(SDATA, arr, filename):
'''
Save array as a datafile (in current detector format).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
arr : 2D numpy array
Array or image object to save.
filename : str or path
Name of the file to save.
Returns
-------
None
The output is *arr* saved as a datafile named *filename*.
The format of the saved datafile corresponds to current detector.
Technical notes
---------------
* This function io.Arrays.save_as_datafile
is just a wrapper that calls *save_datafile* function
for given detector.
* The detector object is accessible in this function
thanks to SDATA argument as *SDATA.detector*.
'''
SDATA.detector.save_datafile(arr, filename)
def prepare_for_show_or_save(arr, icut=None, itype=None, R=None):
'''
Prepare 2D numpy array (which contains a 2D-STEM datafile)
for showing/saving as grayscale image.
Parameters
----------
arr : 2D numpy array
Array or image object to save.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype: string, optional, '8bit' or '16bit', default is None.
Type of the image - 8-bit or 16-bit grayscale.
If none, then the image is saved as 16-bit.
R: integer, optional, default is None
Rescale coefficient;
the input array is rescaled/enlarged R-times.
For smaller 2D-STEM detectors with size 256x256 pixels,
the array should be saved with R = 2 (or 4)
in order to get sufficiently large image for further processing.
Returns
-------
arr : 2D numpy array
The modified array ready for showing or saving on a disk.
'''
# Cut intensity
if icut:
arr = np.where(arr>icut, icut, arr)
# Rescale
if R:
arr_max = np.max(arr)
arr = transform.rescale(arr, R)
arr = arr/np.max(arr) * arr_max
# Prepare for showing/saving as 8bit or 16 bit
if itype == '8bit':
arr = np.round(arr * (255/np.max(arr))).astype(dtype=np.uint8)
else:
arr = arr.astype('uint16')
# Return the modified array
return(arr)
class Images:
'''
Images class = a collection of functions
that work with images representing datafiles from 2D-STEM detector.
'''
def read(image_name, itype='8bit'):
'''
Read grayscale image into 2D numpy array.
Parameters
----------
image_name : string or pathlib object
Name of image that should read into numpy 2D array.
itype: string ('8bit' or '16bit')
type of the image: 8 or 16 bit grayscale
Returns
-------
arr : 2D numpy array
The array converted from *image_name*.
'''
img = Image.open(image_name)
if itype=='8bit':
arr = np.asarray(img, dtype=np.uint8)
else:
arr = np.asarray(img, dtype=np.uint16)
return(arr)
def show(image_name,
icut=None, itype='8bit', R=None, cmap='gray',
center=False, csquare=20, cintensity=0.8):
'''
Read and display image from disk.
Parameters
----------
image_name : str or path-like object
Name of image to read.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is '8bit'
Type of the image - 8 or 16 bit grayscale.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
Returns
-------
Nothing
The output is *image_name* shown on the screen.
'''
# Read Image to array.
arr = Images.read(image_name, itype=itype)
# Show the array using pre-defined stemdiff.io.Array.show function.
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
class Plots:
'''
Plots class = a collection of functions
for easy plotting of diffractograms and/or graphs.
'''
# The functions in this sub-package are not TOO general.
# This is intentional - writing too general plotting funcs makes no sense.
# The exception is the 1st func, which sets plot parameters for any plot.
def set_plot_parameters(size=(12,9), dpi=75,
fontsize=10, my_rcParams=None):
'''
Set global plot parameters (mostly for plotting in Jupyter).
Parameters
----------
size : tuple of two floats, optional, the default is (12,9)
Size of the figure (width, height) in [cm].
dpi : int, optional, the defalut is 75
DPI of the figure.
fontsize : int, optional, the default is 10
Size of the font used in figure labels etc.
my_rcParams : dict, optional, default is None
Dictionary in plt.rcParams format
containing any other allowed global plot parameters.
Returns
-------
None; the result is a modification of the global plt.rcParams variable.
'''
# We test all arguments, if they exist.
# (Theoretically, user could have redefined them as None
# (In such a case we would change nothing and leave default values
if size: # Figure size
# Convert size in [cm] to required size in [inch]
size = (size[0]/2.54, size[1]/2.54)
plt.rcParams.update({'figure.figsize' : size})
if dpi: # Figure dpi
plt.rcParams.update({'figure.dpi' : dpi})
if fontsize: # Global font size
plt.rcParams.update({'font.size' : fontsize})
if my_rcParams: # Other possible rcParams in the form of dictionary
plt.rcParams.update(my_rcParams)
def plot_2d_diffractograms(data_to_plot,
icut=None, cmap='viridis',
output_file=None, dpi=300):
'''
Plot a few selected 2D diffraction patterns in a row, one-by-one.
Parameters
----------
data_to_plot : list of lists
This object is a list of lists.
The number of rows = the number of plotted diffractograms.
Each row contains two elements:
(i) data for diffractogram to plot and (ii) title of the plot.
The data (first element of each row) can be:
(i) PNG-file or (ii) 2D-array containing the 2D diffractogram.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300
(this is used just for plotting; the array values are not changed).
cmap : matplotlib.colormap name, optional, the default is 'viridis'
Matplotlib colormap for plotting of the diffractogram.
Other interesting or high-contrast options:
'gray', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
output_file : str, optional, default is None
If this argument is given,
the plot is also saved in *output_file* image
with *dpi* resolution (dpi is specified by the following argument).
dpi : int, optional, default is 300
The (optional) argument gives resolution of
(the optional) output image.
Returns
-------
None
The output is the plot of the diffraction patterns on the screen.
If argument *ouput_file* is given, the plot is saved as an image.
'''
# Initialize
n = len(data_to_plot)
diffs = data_to_plot
fig,ax = plt.subplots(nrows=1, ncols=n)
# Plot 2D-diffractograms
for i in range(n):
# Read data to plot
data = diffs[i][0]
# Test input data...
if type(data) == str: # ...String - we suppose an image
if data.lower().endswith('.png'): # PNG file, 2D-diffractogram
# we read image as '16bit'
# in this case, it works for 8bit images as well
arr = Images.read(data, itype='16bit')
else: # Other than PNG files are not supported now
print(f'Unsuported format of file {data}!')
sys.exit()
elif type(data) == np.ndarray: # Numpy array
if data.shape[0] == data.shape[1]: # square array, 2D-diff.pat
arr = data
else: # Non-square arrays are not supported now
print('Non-square arrays to plot - not supported.')
sys.exit()
# Read plot parameters
my_title = diffs[i][1]
# Plot i-th datafile/array
ax[i].imshow(arr, vmax=icut, cmap=cmap)
ax[i].set_title(my_title)
# Finalize plot
for i in range(n): ax[i].axis('off')
fig.tight_layout()
if output_file: fig.savefig(output_file, dpi=dpi)
def plot_datafiles_with_NS(SDATA, df, N=None, S=None,
icut=None, rsize=None, cmap='viridis',
output_file=None, dpi=300):
'''
Plot datafiles with selected (N,S)=(NoOfPeaks,Entropy) in a row.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be shown.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
N : list of integers, arbitrary lenght, but should it not be empty
List of N values, where N = estimated NumberOfPeaks in a datafile.
This function will plot datafiles for given combination of (N,S).
The sister argument S is described below.
The (N,S) lists must have the same length.
S : list of integers, arbitrary lenght, but should it not be empty
List of S values; S = calculated ShannonEntropy of a datafile.
This function will plot datafiles for given combination of (N,S).
The sister argument N is described above.
The (N,S) lists must have the same length.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300
(this is used just for plotting; the array values are not changed).
rsize : int, optional, default is None
Reduced size of an image.
If rsize = 100, size of all datafiles/images is reduced
so that only the central square with size = rsize is plotted.
The center coordinates for given datafile
are taken from the (obligatory) df argument (see above).
cmap : matplotlib.colormap name, optional, the default is 'viridis'
Matplotlib colormap for plotting of the diffractogram.
Other interesting or high-contrast options:
'gray', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
output_file : str, optional, default is None
If this argument is given,
the plot is also saved in *output_file* image
with *dpi* resolution (dpi is specified by the following argument).
dpi : int, optional, default is 300
The (optional) argument gives resolution of
(the optional) output image.
Returns
-------
None
The output are the datafiles plotted on the screen.
If *ouput_file* is given, the plot is also saved as an image.
Technical note
--------------
The function has no return statement => it returns None.
In Jupyter and Spyder, the plot/figure is drawn on the screen.
In CLI Python, the figure should be saved => argument *output_file*.
'''
# STEP 1: Prepare multiplot.
fig,ax = plt.subplots(nrows=1, ncols=len(N))
# STEP 2: Find datafiles with selected N,S combinations,
# and insert them in the prepared multiplot one-by-one.
#
# The finding of the datafiles:
# In given database (argument df),
# we localize datafiles with selected combination of (N,S)
# a) N = Peaks = NumberOfPeaks
# b) S = index of file sorted according to Shannon entropy
# => S = 1 : the file with given N and the highest S
# => S = 2,3... : like previous, but the 2nd, 3rd highest S ...
# => S = -1 : the file with given N and the lowest S
# => S = -2,-3... : like previous, but the 2nd,3rd lowest S ...
# Go through (N,S) pairs,
# find datafiles and insert them in the muptiplot.
# * We combine enumerate and zip
# (https://stackoverflow.com/q/16326853
# reason for enumerate => index i = plot/axes number = index in ax[i]
# reason for zip => we need (N,S) pairs = two values together
for i,(n,s) in enumerate(zip(N,S)):
# (1) Create sub-database, which ...
# a) contains just entries for given n = N = Peaks = NoOfPeaks
# b) is sorted according to S = ShannonEtropy descendingly
dfN = df[df.Peaks == n].sort_values(
by='S', ascending=False, ignore_index=True)
# (2) Modify value of s according to logic ...
if s >= 1:
# If s >= 1, decrease its value
# Reason: Python indexing start from 0, we start from 1
s = s - 1
elif s > len(dfN):
# If s > lenght_of_database_with_given_N, set it to maximum
# Reason: if s = 10 and len(dfN) is just 8 => s = 8
s = len(dfN)
elif s < - len(dfN):
# If s < -lenght_of_database_with_given_N, set it to maximum
# Reason: if s = -10 and len(dfN) is just 8 => s = -8
s = -len(dfN)
# (3) Get datafile name
# (the datafile name is extracted from df by means of iloc
# (the 1st index = 1 => col-index: col 1 in df/dfN = datafile names
# (the 2nd index = s => row-index: dfN was sorted according to 'S'
datafile = dfN.iloc[s,0]
# (4) Read the datafile to an array
arr = Datafiles.read(SDATA, datafile)
# (5) Reduce size if requested
# (remove edges and keep just central square with
if rsize:
# Find datafile center coordinates (xc,yc)
# (xc,yc are in the database in columns 2 and 3, respectively
R = SDATA.detector.upscale
xc = int(np.round(dfN.iloc[s,1]/R,0))
yc = int(np.round(dfN.iloc[s,2]/R,0))
# Keep just central square with size = rsize
arr = Arrays.remove_edges(arr, rsize, xc, yc)
# (6) Draw datafile in the plot
# (a) Get parameters for plot title
# (the plot title should be something like N=10,S=1,M=1200
# (where N,S,M are NoOfPeaks, ShannonEntropy and MaxIntensity
max_intensity = dfN.iloc[s,3]
peaks = dfN.iloc[s,4]
entropy = dfN.iloc[s,5]
# (b) Prepare title
plot_title = \
f'N={peaks:d} S={entropy:.2f} M={int(max_intensity):d}'
# (c) Create the plot
ax[i].imshow(arr, vmax=icut, cmap=cmap)
ax[i].set_title(plot_title)
# STEP 3: Finalize the figure (and save it, if requested)
# Finalize figure
for i in range(len(N)): ax[i].axis('off')
fig.tight_layout()
# Save if requested
if output_file:
fig.savefig(output_file, dpi=dpi, facecolor='white')
# IMPORTANT TECHNICAL NOTE: Output of this function
# The function has no return statement => it returns None.
# In Jupyter and Spyder, the plot/figure is drawn on the screen.
# This is a property of figs - last command draws them in iPython.
# Here the last command is: fig.tight_layout() => figure is drawn.
# In classical/CLI Python, the fig should be saved => output_file arg.
'''
Module: stemdiff.sum
--------------------
The summation of 4D-STEM datafiles to create one 2D powder diffraction file.
* stemdiff.sum = this module, which runs on a single core (serial processing)
* stemdiff.summ = sister module running on multiple cores (parallel processing)
To perform the summation, we just call function sum_datafiles:
* serial : stemdiff.sum.sum_datafiles(SDATA, DIFFIMAGES, df, deconv, ...)
* parallel : stemdiff.summ.sum_datafiles(SDATA, DIFFIMAGES, df, deconv, ...)
The initial arguments are:
* SDATA = stemdiff.gvars.SourceData object = description of source data
* DIFFIMAGES = stemdiff.gvars.DiffImages object = description of diffractograms
* df = pre-calculated database of datafiles/diffratograms to sum
Key argument is deconv, which determines the processing type:
* deconv=0 = sum *without* deconvolution
* deconv=1 = R-L deconvolution with global PSF from low-diffraction datafiles
* deconv=2 = subtract background + R-L deconvolution with PSF from the center
'''
import numpy as np
import stemdiff.io
import stemdiff.dbase
import idiff
from skimage import restoration
import tqdm
import sys
def sum_datafiles(SDATA, DIFFIMAGES, df, deconv=0, psf=None, iterate=10):
"""
Sum datafiles from a 4D-STEM dataset to get 2D powder diffractogram.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be summed.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
deconv : int, optional, default is 0
Deconvolution type:
0 = no deconvolution,
1 = deconvolution based on external PSF,
2 = deconvolution based on PSF from central region,
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered,
we get the sum of filtered datafiles.
Additional arguments determin the (optional) type of deconvolution.
Technical notes
---------------
* This function works as a signpost.
* It reads the summation parameters and calls a more specific summation
functions (which aren NOT called directly by the end-user).
* It employs progress bar, handles possible exceptions,
and returns the final array (= post-processed and normalized array).
"""
# (1) Prepare variables for summation
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
datafiles = [datafile[1] for datafile in df.iterrows()]
sum_arr = np.zeros((img_size * R, img_size * R), dtype=np.float32)
# (2) Prepare variables for tqdm
# (to create a single progress bar for the entire process
total_tasks = len(datafiles)
sys.stderr = sys.stdout
# (3) Run summations
# (summations will run with tqdm
# (we will use several types of summations
# (each summations uses datafiles prepared in a different way
with tqdm.tqdm(total=total_tasks, desc="Processing ") as pbar:
try:
# Process each image in the database
for index, datafile in df.iterrows():
# Deconv0 => sum datafiles without deconvolution
if deconv == 0:
sum_arr += dfile_without_deconvolution(
SDATA, DIFFIMAGES, datafile)
# Deconv1 => sum datafiles with DeconvType1
elif deconv == 1:
sum_arr += dfile_with_deconvolution_type1(
SDATA, DIFFIMAGES, datafile, psf, iterate)
# Deconv2 => sum datafiles with DeconvType2
elif deconv == 2:
sum_arr += dfile_with_deconvolution_type2(
SDATA, DIFFIMAGES, datafile, iterate)
# Update the progress bar for each processed image
pbar.update(1)
except Exception as e:
print(f"Error processing a task: {str(e)}")
# (4) Move to the next line after the progress bar is complete
print('')
# (5) Post-process the summation and return the result
return sum_postprocess(sum_arr, len(df))
def sum_postprocess(sum_of_arrays, n):
"""
Normalize and convert the summed array to 16-bit unsigned integers.
Parameters
----------
sum_of_arrays : np.array
Sum of the arrays -
usually from stemdiff.sum.sum_datafiles function.
n : int
Number of summed arrays -
usually from stemdiff.sum.sum_datafiles function.
Returns
-------
arr : np.array
Array representing final summation.
The array is normalized and converted to unsigned 16bit integers.
"""
arr = np.round(sum_of_arrays/n).astype(np.uint16)
return(arr)
def dfile_without_deconvolution(SDATA, DIFFIMAGES, datafile):
"""
Prepare datafile for summation without deconvolution (deconv=0).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType0 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType0 = no deconvolution,
just summation of the prepared datafiles (upscaled, centered...).
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter),round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr,img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Return the datafile as an array that is ready for summation
return(arr)
def dfile_with_deconvolution_type1(SDATA, DIFFIMAGES, datafile, psf, iterate):
"""
Prepare datafile for summation with deconvolution type1 (deconv=1).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
psf : 2D-numpy array
Array representing the 2D-PSF function.
iterate : int
Number of iterations during the deconvolution.
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType1 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType1 = Richardson-Lucy deconvolution using PSFtype1.
* PSFtype1 = 2D-PSF estimated from files with negligible diffractions.
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter), round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr, img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Deconvolution: Richardson-Lucy using a global PSF
# (a) save np.max, normalize
# (reason: deconvolution algorithm requires normalized arrays...
# (...and we save original max.intensity to re-normalize the result
norm_const = np.max(arr)
arr_norm = arr/np.max(arr)
psf_norm = psf/np.max(psf)
# (b) perform the deconvolution
arr_deconv = restoration.richardson_lucy(
arr_norm, psf_norm, num_iter=iterate)
# (c) restore original range of intensities = re-normalize
arr = arr_deconv * norm_const
# (4) Return the deconvolved datafile
# as an array that is ready for summation
return arr
def dfile_with_deconvolution_type2(SDATA, DIFFIMAGES, datafile, iterate):
"""
Prepare datafile for summation with deconvolution type2 (deconv=2).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
iterate : int
Number of iterations during the deconvolution.
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType1 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType2 = Richardson-Lucy deconvolution
using PSFtype2 + simple background subtraction.
* PSFtype2 = 2D-PSF estimated from central region of the datafile
AFTER background subtraction.
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
psf_size = DIFFIMAGES.psfsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter),round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr,img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Remove background
arr = idiff.bcorr.rolling_ball(arr, radius=20)
# (4) Prepare PSF from the center of given array
# (recommended parameters:
# (psf_size => to be specified in the calling script ~ 30
# (circular => always True - square PSF causes certain artifacts
psf = idiff.psf.PSFtype2.get_psf(arr, psf_size, circular=True)
# (5) Deconvolution
# (a) save np.max, normalize
# (reason: deconvolution algorithm requires normalized arrays...
# (...and we save original max.intensity to re-normalize the result
norm_const = np.max(arr)
arr_norm = arr/np.max(arr)
psf_norm = psf/np.max(psf)
# (b) perform the deconvolution
arr_deconv = restoration.richardson_lucy(
arr_norm, psf_norm, num_iter=iterate)
# (c) restore original range of intensities = re-normalize
arr = arr_deconv * norm_const
# (6) Return the deconvolved datafile
# as an array that is ready for summation
return(arr)
'''
Module: stemdiff.summ
---------------------
The summation of 4D-STEM datafiles to create one 2D powder diffraction file.
* The summation runs on all available cores (parallel processing).
* This module takes functions from semdiff.sum, but runs them parallelly.
The key function of the module (for a user) = stemdiff.summ.sum_datafiles:
* The function takes the same arguments as stemdiff.sum.sum_datafiles.
* The only difference consists in that the summation runs on multiple cores.
How it works:
* This module contains just two functions:
- `summ.sum_datafiles` - wrapper for the next function
- `summ.multicore_sum` - runs the summation on multiple cores
* The rest is done with the functions of sister module *stemdiff.sum*.
- i.e. the summ.multicore_sum calls functions from stemdiff.sum
- but the functions run within this module, using multiple cores
* Summary:
- `sum.sum_datafiles` - runs on a single core (docs in stemdiff.sum)
- `summ.sum_datafiles` - runs on multiple cores, aguments are identical
'''
import os
import sys
import tqdm
import stemdiff.sum
import concurrent.futures as future
def sum_datafiles(
SDATA, DIFFIMAGES,
df, deconv=0, psf=None, iterate=10):
'''
Sum datafiles from a 4D-STEM dataset to get 2D powder diffractogram.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Database with datafile names and characteristics.
deconv : int, optional, default is 0
Deconvolution type:
0 = no deconvolution,
1 = deconvolution based on external PSF,
2 = deconvolution based on PSF from central region,
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered, we get the sum of filtered datafiles,
if PSF is given, we get the sum of datafiles with PSF deconvolution.
Technical notes
---------------
* This function is a wrapper.
* It calls stemdiff.summ.multicore_sum with correct arguments:
- all relevant original arguments
- one additional argument: the *function for summation*
- the *function for summation* depends on the deconvolution type
'''
if deconv == 0:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_without_deconvolution)
elif deconv == 1:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_with_deconvolution_type1)
elif deconv == 2:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_with_deconvolution_type2)
else:
print(f'Unknown deconvolution type: deconv={deconv}')
print('Nothing to do.')
return None
return arr
def multicore_sum(SDATA, DIFFIMAGES, df, psf, iterate, func):
'''
Execute concurrent data processing using a thread pool.
This function processes a list of datafiles using a thread pool
for parallel execution. The number of concurrent workers is determined
by subtracting 1 from the available CPU cores.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Database with datafile names and characteristics.
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
func : a function from stemdiff.sum module to be used for summation
A function from sister module stemdiff.sum,
which will be used for summation on multiple cores.
This argument is (almost always) passed from the calling function
stemdiff.summ.sum_datafiles so that it corresponded to
the user-selected deconvolution type.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered, we get sum of filtered datafiles,
if PSF is given, we get sum of datafiles with PSF deconvolution.
Technical notes
---------------
* This function is NOT to be called directly.
* It is called by wrapper function stemdiff.summ.sum_datafiles.
* The two functions work as follows:
- calling function = stemdiff.summ.sum_datafiles
- passes all relevant arguments including function for summation
- this function = stemdiff.summ.multicore_sum
- runs the summation on multiple cores and returns the result
'''
# (0) Initialize
num_workers = os.cpu_count() # Number of concurrent workers
datafiles = [datafile[1] for datafile in df.iterrows()]
# (1) Use ThreadPool to perform multicore summation
with future.ThreadPoolExecutor(max_workers=num_workers) as executor:
# (a) Prepare variables
futures = []
total_tasks = len(datafiles)
# (b) Submit tasks to the executor
for i, file in enumerate(datafiles):
try:
if func == stemdiff.sum.dfile_without_deconvolution:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file)
elif func == stemdiff.sum.dfile_with_deconvolution_type1:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file, psf, iterate)
elif func == stemdiff.sum.dfile_with_deconvolution_type2:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file, iterate)
else:
raise Exception("Uknown deconvolution function!")
futures.append(future_obj)
except Exception as e:
print(f"Error processing file {file}: {str(e)}")
# (c) Use tqdm to create a progress bar
stderr_original = sys.stderr
sys.stderr = sys.stdout
with tqdm.tqdm(total=total_tasks,
desc="Processing ") as pbar:
# ...wait for all tasks to complete
for future_obj in future.as_completed(futures):
try:
future_obj.result()
except Exception as e:
print(f"Error processing a task: {str(e)}")
pbar.update(1)
sys.stderr = stderr_original
# (2) Summation done, collect the results
# (a) Print a new line to complete the progress bar
print()
# (b) Collect results
deconvolved_data = [f.result() for f in futures]
# (3) Results collected, perform post-processing
# (a) Sum results = the processed/deconvolved files from previous steps
sum_arr = sum(deconvolved_data)
# (b) Run post-processing routine = normalization,
final_arr = stemdiff.sum.sum_postprocess(sum_arr,len(deconvolved_data))
# (4) Return final array = sum of datafiles with (optional) deconvolution
return(final_arr)
+21
-12
Metadata-Version: 2.1
Name: stemdiff
Version: 5.2.6
Version: 5.2.7
Summary: Convert 4D-STEM data to 2D-powder diffraction pattern.

@@ -34,3 +34,3 @@ Home-page: https://github.com/mirekslouf/stemdiff/

[https://doi.org/10.3390/ma14247550](https://doi.org/10.3390/ma14247550)
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-).
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-)

@@ -44,3 +44,3 @@ Principle

------------
* Requirement: Python with sci-modules: numpy, matplotlib, scipy, pandas
* Requirement: Python with sci-modules = numpy, matplotlib, scipy, pandas
* `pip install scikit-image` = 3rd party package for advanced image processing

@@ -54,8 +54,8 @@ * `pip install tqdm` = to show progress meter during long summations

* See how it works:
- Look at [worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
in Jupyter.
* Try it yourself:
- Download [complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0).
- After downloading, unzip and follow the instructions in *readme* file.
* Look at the
[worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
to see STEMDIFF in action.
* Download
[complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0)
and try STEMDIFF yourself.

@@ -65,6 +65,9 @@ Documentation, help and examples

* [PyPI](https://pypi.org/project/stemdiff) repository.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository.
* [PyPI](https://pypi.org/project/stemdiff) repository -
the stable version to install.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository -
the current version under development.
* [GitHub Pages](https://mirekslouf.github.io/stemdiff)
with [documentation](https://mirekslouf.github.io/stemdiff/docs).
with [help](https://mirekslouf.github.io/stemdiff/docs)
and [complete package documentation](https://mirekslouf.github.io/stemdiff/docs/pdoc.html/stemdiff.html).

@@ -96,1 +99,7 @@ ## Versions of STEMDIFF

* Version 5.2 = (beta) improvement of diff.patterns in sister package idiff
Acknowledgement
---------------
The development was co-funded by TACR, program NCK,
project [TN02000020](https://www.isibrno.cz/en/centre-advanced-electron-and-photonic-optics).
+20
-11

@@ -13,3 +13,3 @@ STEMDIFF :: 4D-STEM dataset to 2D-diffractogram

[https://doi.org/10.3390/ma14247550](https://doi.org/10.3390/ma14247550)
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-).
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-)

@@ -23,3 +23,3 @@ Principle

------------
* Requirement: Python with sci-modules: numpy, matplotlib, scipy, pandas
* Requirement: Python with sci-modules = numpy, matplotlib, scipy, pandas
* `pip install scikit-image` = 3rd party package for advanced image processing

@@ -33,8 +33,8 @@ * `pip install tqdm` = to show progress meter during long summations

* See how it works:
- Look at [worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
in Jupyter.
* Try it yourself:
- Download [complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0).
- After downloading, unzip and follow the instructions in *readme* file.
* Look at the
[worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
to see STEMDIFF in action.
* Download
[complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0)
and try STEMDIFF yourself.

@@ -44,6 +44,9 @@ Documentation, help and examples

* [PyPI](https://pypi.org/project/stemdiff) repository.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository.
* [PyPI](https://pypi.org/project/stemdiff) repository -
the stable version to install.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository -
the current version under development.
* [GitHub Pages](https://mirekslouf.github.io/stemdiff)
with [documentation](https://mirekslouf.github.io/stemdiff/docs).
with [help](https://mirekslouf.github.io/stemdiff/docs)
and [complete package documentation](https://mirekslouf.github.io/stemdiff/docs/pdoc.html/stemdiff.html).

@@ -75,1 +78,7 @@ ## Versions of STEMDIFF

* Version 5.2 = (beta) improvement of diff.patterns in sister package idiff
Acknowledgement
---------------
The development was co-funded by TACR, program NCK,
project [TN02000020](https://www.isibrno.cz/en/centre-advanced-electron-and-photonic-optics).
Metadata-Version: 2.1
Name: stemdiff
Version: 5.2.6
Version: 5.2.7
Summary: Convert 4D-STEM data to 2D-powder diffraction pattern.

@@ -34,3 +34,3 @@ Home-page: https://github.com/mirekslouf/stemdiff/

[https://doi.org/10.3390/ma14247550](https://doi.org/10.3390/ma14247550)
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-).
* If you use STEMDIFF package, **please cite** the 2nd publication (or both :-)

@@ -44,3 +44,3 @@ Principle

------------
* Requirement: Python with sci-modules: numpy, matplotlib, scipy, pandas
* Requirement: Python with sci-modules = numpy, matplotlib, scipy, pandas
* `pip install scikit-image` = 3rd party package for advanced image processing

@@ -54,8 +54,8 @@ * `pip install tqdm` = to show progress meter during long summations

* See how it works:
- Look at [worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
in Jupyter.
* Try it yourself:
- Download [complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0).
- After downloading, unzip and follow the instructions in *readme* file.
* Look at the
[worked example](https://www.dropbox.com/scl/fi/l5eskdgxo7976ea9x35fp/01_sdiff_au.nb.pdf?rlkey=7yd5tqtcm3zxr1uc0m0aisenl&dl=0)
to see STEMDIFF in action.
* Download
[complete examples with data](https://www.dropbox.com/scl/fo/ccb6hs28er9dc1xufshh4/h?rlkey=omk5bqoe17jmedhj407ng9xr0&dl=0)
and try STEMDIFF yourself.

@@ -65,6 +65,9 @@ Documentation, help and examples

* [PyPI](https://pypi.org/project/stemdiff) repository.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository.
* [PyPI](https://pypi.org/project/stemdiff) repository -
the stable version to install.
* [GitHub](https://github.com/mirekslouf/stemdiff) repository -
the current version under development.
* [GitHub Pages](https://mirekslouf.github.io/stemdiff)
with [documentation](https://mirekslouf.github.io/stemdiff/docs).
with [help](https://mirekslouf.github.io/stemdiff/docs)
and [complete package documentation](https://mirekslouf.github.io/stemdiff/docs/pdoc.html/stemdiff.html).

@@ -96,1 +99,7 @@ ## Versions of STEMDIFF

* Version 5.2 = (beta) improvement of diff.patterns in sister package idiff
Acknowledgement
---------------
The development was co-funded by TACR, program NCK,
project [TN02000020](https://www.isibrno.cz/en/centre-advanced-electron-and-photonic-optics).

@@ -7,3 +7,11 @@ CITATION

setup.py
src/STEMDIFF/__init__.py
src/STEMDIFF/dbase.py
src/STEMDIFF/detectors.py
src/STEMDIFF/gvars.py
src/STEMDIFF/io.py
src/STEMDIFF/sum.py
src/STEMDIFF/summ.py
src/stemdiff/__init__.py
src/stemdiff/__main__.py
src/stemdiff/dbase.py

@@ -10,0 +18,0 @@ src/stemdiff/detectors.py

'''
Package: STEMDIFF
-----------------
Conversion of a 4D-STEM dataset to a 2D-powder diffration pattern.
* Input = 4D-STEM dataset = 2D-array of 2D-NBD patterns.
- 2D-array = usually one (or more) two-dimensional scanning array(s).
- 2D-NBD = monocrystalline-like nanobeam diffraction pattern.
* Output = 2D-powder diffraction pattern
- The final 2D-diffractogram is a specific summation of 2D-NBD patterns.
- In other words, a 4D-STEM dataset is reduced to a 2D-diffraction pattern.
- The whole method (and final pattern) is called 4D-STEM/PNBD (powder NBD).
STEMDIFF modules:
* stemdiff.dbase = calculate database of 4D-stem datafiles
* stemdiff.detectors = objects describing detectors and format of the datafiles
* stemdiff.gvars = objects describing *source data* and *diffraction images*
* stemdiff.io = input/output for datafiles and corresponding arrays + images
* stemdiff.sum = summation of datafiles (standard, serial processing)
* stemdiff.summ = summation of datafiles (parallel, multicore processing)
STEMDIFF auxiliary package IDIFF and its modules:
* IDIFF contains functions for the improvement of diffraction patterns
* IDIFF is imported below so that it could be used as: sd.idiff.some_module
* IDIFF modules are:
- idiff.bcorr = background correction/subtraction
- idiff.deconv = advanced deconvolution methods
- idiff.ncore = noise correction/reduction
- idiff.psf = estimate of PSF function
'''
__version__ = "5.2.6"
# Import modules of stemdiff package
# this enables us to use modules as follows:
# >>> import stemdiff as sd
# >>> sd.io.Datafiles.read(SDATA, '000_001.dat')
import stemdiff.dbase
import stemdiff.detectors
import stemdiff.gvars
import stemdiff.io
import stemdiff.sum
import stemdiff.summ
# Import supplementary package idiff
# this enables us to use the modules as follows:
# >>> import stemdiff as sd
# >>> sd.idiff.bcorr.rolling_ball(arr)
import idiff
'''
Module: stemdiff.dbase
----------------------
Functions to calculate, save and re-read the database of 4D-STEM datafiles.
* The database is saved as a pandas.DataFrame object,
which contains the following data for each datafile:
filename, XY-center coordinates, MaximumIntensity, NoOfPeaks, ShannonEntropy.
* The database enables fast filtering of datafiles
and fast access to datafile features,
which do not have to be re-calculated repeatedly.
'''
import numpy as np
import pandas as pd
import stemdiff.io
from skimage import measure
def calc_database(SDATA, DIFFIMAGES):
"""
Read 4D-STEM datafiles and calculate database of all files, which contains
[filename, XY-center coordinates, MaxIntensity, NoOfPeaks, ShannonEntropy]
for each datafile.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The object describing the diffraction images/patterns.
Returns
-------
df : pandas DataFrame object
Database contains [filename, xc, yc, MaxInt, NumPeaks, S]
for each datafile in the dataset.
"""
# Prepare variables before we run for-cycle to go through datafiles
# (initialize coordinates of the center + rescale/upscale coefficient
xc,yc = (None,None)
R = SDATA.detector.upscale
# Initialize list of datafiles
# (appending to lists is more efficient than appending to np/pd-structures
list_of_datafiles = []
# Pre-calculate additional variables
# (it would be wasteful to calculate them in each for-cycle loop
neighborhood_matrix = np.ones((DIFFIMAGES.peak_dist,DIFFIMAGES.peak_dist))
# Go through datafiles and calculate their entropy
for datafile in SDATA.filenames:
# a) Read datafile
arr = stemdiff.io.Datafiles.read(SDATA,datafile)
# b) Get datafile name
datafile_name = datafile.relative_to(SDATA.data_dir)
# c) Calculate intensity center
# (to get high precision
# (the center must be determined for rescaled/upscaled array
if DIFFIMAGES.ctype == 2:
xc,yc = stemdiff.io.Arrays.find_center(
stemdiff.io.Arrays.rescale(arr, R, order=3), # rescaled array
DIFFIMAGES.csquare*R, DIFFIMAGES.cintensity) # central region
elif (DIFFIMAGES.ctype == 1) and (xc == None):
xc,yc = stemdiff.io.Array.find_center(
stemdiff.io.Arrays.rescale(arr, R, order=3), # rescaled array
DIFFIMAGES.csquare*R, DIFFIMAGES.cintensity) # central region
elif (DIFFIMAGES.ctype == 0) and (xc == None):
geometric_center = round(SDATA.detector.detector_size*R/2)
xc,yc = (geometric_center,geometric_center)
# d) Determine maximum intensity
# NOTE: max_intensity must be saved as float
# Reason: when we later convert to pandas.Dataframe
# ...the datatypes are inferred from the first value in each column
# ...which may not be correct if the first integer is a small number
# ...as the estimated datatype may be int16
# ...and this may not work for larger values in the next datafiles
max_intensity = float(np.max(arr))
# e) Estimate number of peaks (local maxima)
no_of_maxima = stemdiff.io.Arrays.number_of_peaks(arr,
peak_height = DIFFIMAGES.peak_height,
neighborhood_matrix=neighborhood_matrix)
# f) Calculate Shannon entropy of the datafile
entropy = measure.shannon_entropy(arr)
entropy = round(entropy,2)
# d) Append all calculated values to list
list_of_datafiles.append(
[datafile_name, xc, yc, max_intensity, no_of_maxima, entropy])
# Convert list to pandas DataFrame
df = pd.DataFrame(
list_of_datafiles,
columns=['DatafileName','Xcenter','Ycenter', 'MaxInt', 'Peaks','S'])
# Return the dataframe containing names of datafiles + their entropies
return(df)
def save_database(df, output_file):
"""
Save the database of 4D-STEM datafiles;
the dbase is a pandas.DataFrame object saved as pickled object/zip-file.
Parameters
----------
df : pandas.DataFrame object
This object is a database of all 4D-STEM datafiles,
which contains basic characteristics (filename, XY-center, entropy ...)
of each datafile in 4D-STEM dataset.
output_file : str
Filename of the output file (without extension).
The database is saved as a pickled object/zip-file.
Returns
-------
None
The output is the database *df* saved as *output_file* on a disk.
"""
df.to_pickle(output_file)
def read_database(input_database):
"""
Read the calculated database of 4D-STEM datafiles.
Parameters
----------
input_database : str or pandas.DataFrame
* str = filename of the input file that contains the database.
* pandas.Dataframe = dataframe that contains the database.
* Why two possile types of input?
If the database is in memory in the form of pandas.DataFrame
(which is quite common when using STEMDIFF package),
it is useles to re-read it from file.
Returns
-------
df : pandas.DataFrame object
Database that has been read from disk or pandas.Dataframe.
"""
# Read database from [input_file]
# NOTE: input_database is either saved/pickled file or pd.DataFrame
# Reason: sometimes it is more efficient to use pd.DataFrame directly
if type(input_database) == pd.DataFrame:
df = input_database
else:
df = pd.read_pickle(input_database)
return(df)
'''
Module: stemdiff.detectors
--------------------------
Description of detectors that can be used in stemdiff package.
This module is basically a container of classes.
Each class describes a 2D STEM detector, from which we can read datafiles.
The description of the detector not difficult.
All we need to define is the detector name,
a few technical parameters described below,
and how to read/save datafiles in given detector format.
All detector parameters are described below in TimePix detector class.
Therefore, the new classes = new detectors can be added quite easily:
* Copy the class describing TimePix detector.
* Rename the class as needed, for example: My_new_STEM_detector.
* Re-define all properties and methods of the new class as necessary.
* When you are done, the new detector can be used within STEMDIFF package.
'''
import sys
import inspect
import numpy as np
from PIL import Image
def list_of_known_detectors():
'''
Get a list of known detectors = classes defined in stemdiff.detectors.
Returns
-------
detectors : list
List of known detectors = classes defined in stemdiff.detectors module.
'''
# Prepare list of known detectors
detectors = []
# Get names of all classes in current module
# Based on stackoveflow: https://stackoverflow.com/q/1796180
for name, obj in inspect.getmembers(sys.modules[__name__]):
if inspect.isclass(obj):
detectors.append(obj)
# Return list of known detectors
return(detectors)
def print_known_detectors():
'''
Print a list of known detectors = classes defined in stemdiff.detectors.
Returns
-------
Nothing
The list of detectors is just printed on the screen.
'''
detectors = list_of_known_detectors()
print('List of knonw detectors = classes defined in stemdiff.detectors:')
for detector in detectors:
print(detector)
def describe_detector(detector_object):
'''
Global method: the description of the detector on the screen.
Parameters
----------
detector_object : object of any class in stemdiff.detectors
The initialized detector object.
The object contains description of the detector.
It is created by calling init method of any stemdiff.detectors class.
Returns
-------
None
The output is the text on the screen.
Example
-------
>>> # Short example
>>> # (minimalistic example
>>> import stemdiff as sd
>>> my_detector = sd.detectors.TimePix()
>>> my_detector.self_describe() # OO-interface, indirect call
>>> describe_detector(my_detector) # procedural interface, direct call
>>>
>>> # Real usage
>>> # (in STEMDIFF scripts,
>>> # (the detector is usually defined
>>> # (within stemdiff.gvars.SourceData object
>>> import stemdiff as sd
>>> SDATA = sd.gvars.SourceData(
>>> detector = sd.detectors.TimePix(),
>>> data_dir = r'./DATA',
>>> filenames = r'*.dat')
>>> SDATA.detector.self_describe()
'''
print('Detector name :', detector_object.detector_name)
print('Detector size :', detector_object.detector_size)
print('Maximum intensity :', detector_object.max_intensity)
print('Intensity data type :', detector_object.data_type.__name__)
print('Upscale parameter :', detector_object.upscale)
print('-----')
print('* Detector size = size of the (square) detector (in pixels).')
print('* Maximum intensity = max.intensity the detector can measure.')
print('* Intensity data type = the format of the saved intensities.')
print('* Upscale parameter: each datafile/image is upscaled by it.')
class TimePix:
'''
Definition of TimePix detector.
Parameters
----------
detector_name : str, default is 'TimePix'
Name of the detector.
Keep the default unless you have specific reasons.
detector_size : integer, default is 256
Size of the detector in pixels.
Keep the default unless you have specific reasons.
max_intensity : int, default is 11810
Maximum intensity of TimePix detector.
Keep the default unless you have specific reasons.
data_type : numpy data type, optional, default is np.uint16
Type of data, which are saved in the binary file.
TimePix detector saves the data as 16-bit integers.
This corresponds to np.uint16 (more info in NumPy documentation).
upscale : integer, default is 4
Upscaling coefficient.
Final image size = detector_size * upscale.
The upscaling coefficient increases the detector resolution.
Surprisingly enough, the upscaling helps to improve final resolution.
Returns
-------
TimePix detector object
Format of TimePix datafiles
---------------------------
* binary data files, usually with DAT extension
* a 1D arrays of 16-bit intensities = np.uint16 values
'''
def __init__(self, detector_name='TimePix',
detector_size=256, max_intensity=11810,
data_type=np.uint16, upscale=4):
# The initialization of TimePix detector objects.
# The parameters are described above in class definition.
# -----
self.detector_name = detector_name
self.detector_size = detector_size
self.max_intensity = max_intensity
self.data_type = data_type
self.upscale = upscale
def self_describe(self):
'''
Print a simple textual description of the detector on the screen.
Returns
-------
None
The description of the detector is just printed on the screen.
Technical note
--------------
* This is just a wrapper around global function named
stemdiff.detectors.describe_detector.
* Reason: this global function is common to all detector types.
* This simple solution is used instead of (needlessly complex)
inheritance in this case.
'''
describe_detector(self)
def read_datafile(self, filename, arr_size=None):
'''
Read datafile in TimePix detector format.
Parameters
----------
filename : str or path
Name of the datafile to read.
arr_size : int, optional, default is None
Size of the square array to read.
Typically, we read original datafiles,
whose size = detector.size.
Nonetheless, the datafiles might have been saved
with a smaller size = arr_size.
Returns
-------
arr : 2D-numpy array
2D-array containing image from TimePix detector.
Each element of the array = the intensity detected at given pixel.
'''
# Read binary datafile (to 1D-array)
arr = np.fromfile(filename, dtype=self.data_type)
# Determine edge of the file - we work only with square files
edge = int(np.sqrt(arr.size))
# Reshape the array and return
arr = arr.reshape(edge, edge)
return(arr)
def save_datafile(self, arr, filename):
'''
Save 2D-array as a datafile in the TimePix detector format.
Parameters
----------
arr : numpy array
The array to save in the datafile with [filename].
filename : str or path-like object
The filename of the saved array.
Returns
-------
None
The result is the file named *filename*,
containing the *arr* in stemdiff.detectors.TimePix format.
'''
# Slightly modified according to
# https://stackoverflow.com/q/43211616
fh = open(filename,'wb')
arr = arr.flatten()
BlockArray = np.array(arr).astype(np.uint16)
BlockArray.tofile(fh)
fh.close()
class Secom:
'''
Definition of Secom detector.
Parameters
----------
detector_name : str, default is 'Secom'
Name of the detector.
Keep the default unless you have specific reasons.
detector_size : integer, default is 2048
Size of the detector in pixels.
Keep the default unless you have specific reasons.
data_type : numpy data type, optional, default is np.uint16
Type of data, which are saved in the Secom TIFF-files.
Secom detector saves the data as 16-bit TIFF files.
This corresponds to np.uint16 (more info in NumPy documentation).
upscale : integer, default is 1
Upscaling coefficient.
Final image size = detector_size * upscale.
The upscaling coefficient increases the detector resolution.
Surprisingly enough, the upscaling helps to improve final resolution.
Returns
-------
Secom detector object.
Format of SECOM datafiles
-------------------------
* image files, TIFF format
* 16-bit images = images containing np.uint16 values
'''
def __init__(self, detector_name='Secom',
detector_size=2048, max_intensity=65536,
data_type=np.uint16, upscale=1):
# The initialization of Secom detector objects.
# The parameters are described above in class definition.
# -----
self.detector_name = detector_name
self.detector_size = detector_size
self.max_intensity = max_intensity
self.data_type = data_type
self.upscale = upscale
def self_describe(self):
'''
Print a simple textual description of the detector on the screen.
Returns
-------
None
The description of the detector is just printed on the screen.
Technical note
--------------
* This is just a wrapper around global function named
stemdiff.detectors.describe_detector.
* Reason: this global function is common to all detector types.
* This simple solution is used instead of (needlessly complex)
inheritance in this case.
'''
describe_detector(self)
def read_datafile(self, filename):
'''
Read datafile in Secom detector format.
Parameters
----------
filename : str or path
Name of the datafile to read.
Returns
-------
arr : 2D-numpy array
2D-array containing image from Secom detector.
Each element of the array = the intensity detected at given pixel.
'''
arr = np.array(Image.open(filename))
return(arr)
def save_datafile(self, arr, filename):
'''
Save 2D-array as a datafile in the Secom detector format.
Parameters
----------
arr : numpy array
The array to save in the datafile with [filename].
filename : str or path-like object
The filename of the saved array.
Returns
-------
None
The result is the file named *filename*,
containing the *arr* in stemdiff.detectors.Secom format.
'''
im = Image.fromarray(arr.astype(np.uint16))
im.save(filename)
class Arina:
'''
* TODO: Radim
* The same like for Secom, using method copy+paste+modify :-)
'''
pass
'''
Module: stemdiff.gvars
----------------------
Global variables/objects for package stemdiff.
The global variables are used throughout the whole program.
In order to be easily accessible, they are defined as two objects:
* SourceData = an object defining the input datafiles.
* DiffImages = an object desribing basic/common features of all NBD patterns.
The usage of stemdiff.gvars module is quite simple.
In 99% cases, we just add the following code at beginning of STEMDIFF script
(the arguments have to be adjusted to given experiment, of course):
>>> # Define source data
>>> # (datafiles from TimePix detector
>>> # (saved in data directory: D:/DATA.SH/STEMDIFF/SAMPLE_8
>>> # (datafiles are in subdirs: 01,02,... and named as: *.dat
>>> SDATA = stemdiff.gvars.SourceData(
>>> detector = stemdiff.detectors.TimePix(),
>>> data_dir = r'D:/DATA.SH/STEMDIFF/SAMPLE_8',
>>> filenames = r'??/*.dat')
>>>
>>> # Set parameters of diffraction images
>>> # (we consider only central region with imgsize=100
>>> # (size of the region for PSF estimate will be: psfsize=30
>>> # (values of other/all args => documentation of stemdiff.gvars.DiffImages
>>> DIFFIMAGES = stemdiff.gvars.DiffImages(
>>> imgsize=100, psfsize=30,
>>> ctype=2, csquare=20, cintensity=0.8,
>>> peak_height=100, peak_dist=9)
'''
from pathlib import Path
class SourceData:
'''
Define the input datafiles.
Parameters
----------
data_dir : str (which will be passed to Path method)
This parameter is passed to Path method from pathlib library.
It is strongly recommeded to use r-strings.
Example: `data_dir = r'd:/data.sh/stemdiff/tio2'`.
files : str (which will be passed to data_dir.glob method)
This parameter is passed to Path.glob method from pathlib library.
It is strongly recommeded to use r-strings.
Example1: `datafiles = r'*.dat'` = all `*.dat` files in parent_dir;
Example2: `datafiles = r'??/*.dat'` = all `*.dat` in subdirs `01,02...`
Returns
-------
DataFiles object
'''
def __init__(self, detector, data_dir, filenames):
# The initialization of SourceData objects.
# The parameters are described above in class definition.
# -----
self.detector = detector
self.data_dir = Path(data_dir)
self.filenames = self.data_dir.glob(filenames)
class DiffImages:
'''
Set parameters/characteristics of experimental diffraction images.
Parameters
----------
imgsize : integer, smaller than detector_size
Size of array read from the detector is reduced to imgsize.
If given, we sum only the central square with size = imgsize.
Smaller area = higher speed;
outer area = just weak diffractions.
psfize : integer, smaller than detector_size
Size/edge of central square, from which 2D-PSF is determined.
ctype : integer, values = 0,1,2
0 = intensity center not determined, geometrical center is used;
1 = center determined from the first image and kept constant;
2 = center is determined for each individual image.
csquare : integer, interval = 10--DET_SIZE
Size of the central square (in pixels),
within which the center of intensity is searched for.
cintensity : float, interval = 0--1
Intensity fraction, which is used for center determination.
Example: cintensity=0.9 => consider only pixels > 0.9 * max.intensity.
peak_height : float, optional, default is 100
Search for peaks whose intensity > peak_height.
peak_dist : integer, optional, default is 3
Minimal distance between possible neighboring peaks.
Returns
-------
DiffImages object
'''
def __init__(self,
imgsize=100, psfsize=30,
ctype=2, csquare=20, cintensity=0.8,
peak_height=100, peak_dist=3):
# The initialization of DiffImages objects.
# The parameters are described above in class definition.
# -----
self.imgsize = imgsize
self.psfsize = psfsize
self.ctype = ctype
self.csquare = csquare
self.cintensity = cintensity
self.peak_height = peak_height
self.peak_dist = peak_dist
'''
Module: stemdiff.io
-------------------
Input/output functions for package stemdiff.
Three types of stemdiff.io objects
* Datafiles = files on disk, saved directly from a 2D-STEM detector.
* Arrays = the datafiles converted to numpy array objects.
* Images = the datafiles converted to PNG files.
Additional stemdiff.io utilities
* Plots = easy multiplots from selected datafiles/arrays/images.
* set_plot_parameters = a general function available without subclassing.
General strategy for working with stemdiff.io objects
* Datafiles and Images are usually
not used directly, but just converted to np.array objects.
* All data manipulation (showing, scaling, saving ...)
is done within np.array objects.
* Datafiles and Images have (intentionally) just a limited amount of methods,
the most important of which is read - this method simply reads
Datafile/Image to a np.array.
Examples how to use Datafiles, Arrays and Images
>>> # Show a datafile
>>> # (basic operation => there is Datafiles.function for it
>>> stemdiff.io.Datafiles.show(SDATA, filename)
>>> # Read a datafile to array
>>> # (basic operation => there is Datafiles.function for it
>>> arr = stemdiff.io.Datafiles.read(SDATA, filename)
>>> # Describe AND show the datafile
>>> # (more complex operation:
>>> # (1) read datafile to array - using Datafiles.read
>>> # (2) do what you need (here: describe, show) - using Arrays.functions
>>> arr = stemdiff.io.Datafiles.read(SDATA, datafile)
>>> stemdiff.io.Arrays.describe(arr, csquare=20)
>>> stemdiff.io.Arrays.show(arr, icut=1000, cmap='gray')
'''
import os, sys
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from skimage import transform, measure, morphology
def set_plot_parameters(size=(12,9), dpi=75, fontsize=10, my_rcParams=None):
'''
Set global plot parameters (mostly for plotting in Jupyter).
Parameters
----------
size : tuple of two floats, optional, the default is (12,9)
Size of the figure (width, height) in [cm].
dpi : int, optional, the defalut is 75
DPI of the figure.
fontsize : int, optional, the default is 10
Size of the font used in figure labels etc.
my_rcParams : dict, optional, default is None
Dictionary in plt.rcParams format
containing any other allowed global plot parameters.
Returns
-------
None
The result is a modification of the global plt.rcParams variable.
'''
# This function just calls the final function in Plotting module.
# (left in the main namespace of the package as it is frequently used
Plots.set_plot_parameters(size, dpi, fontsize, my_rcParams)
class Datafiles:
'''
Datafiles class = a collection of functions
that work with datafiles from 2D-STEM detector
(assumption: the datafiles were saved as standard files on a disk).
'''
def read(SDATA, filename):
'''
Read a datafile from STEM detector to an array.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
filename : string or pathlib object
Name of datafile to be read into numpy 2D array.
Filename can be given as an absolute path or
a path relative to SDATA.data_dir
(i.e. data directory path, which is saved in SDATA object).
Returns
-------
arr : 2D numpy array
The array converted from datafile with given *filename*.
'''
# The reading of the datafile depends on the filename argument
# (it can be an absolute path or a path relative to SDATA.data_dir
if os.path.isfile(filename):
# Filename given as absolute path:
# Everything Ok => just read and return
arr = SDATA.detector.read_datafile(filename)
return(arr)
else:
# Filename not given as relative path
# Perhaps it is a relative path - test if it exits
complete_filename = SDATA.data_dir.joinpath(filename)
if os.path.isfile(complete_filename):
# Filename was given as relative path and it exists
# Everything Ok => read and return the completed filename
arr = SDATA.detector.read_datafile(complete_filename)
return(arr)
else:
# Filename not found
# (it was not a correct absolute or relative path
print(f'Datafile [{filename}] not found!')
sys.exit()
def show(SDATA, filename,
icut=None, itype='8bit', R=None, cmap='gray',
center=False, csquare=20, cintensity=0.8):
'''
Show datafile/diffractogram with basic characteristics.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
filename : str or Path
Name of datafile to be shown.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is None
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma' ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
Returns
-------
Nothing
The output is the datafile shown as an image on the screen.
Technical note
--------------
* This function just combines Datafiles.read + Arrays.show functions.
'''
# Read datafile to array
arr = Datafiles.read(SDATA, filename)
# Describe datafile/array
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
def show_from_disk(SDATA,
interactive=True, max_files=None,
icut=1000, itype=None, R=None, cmap='gray',
center=True, csquare=20, cintensity=0.8,
peak_height=100, peak_distance=9):
'''
Show datafiles (stored in a disk) from 2D-STEM detector.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
interactive: bool, optional, the defailt is True
If True, images are shown interactively,
i.e. any key = show next image, 'q' = quit.
max_files: integer, optional, the default is None
If not(interactive==True) and max_files > 0,
show files non-interactively = in one run, until
number of files is less than max_files limit.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, None or '8bit' or '16bit'
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
peak_height : int, optional, default is 100
Minimal height of the peak to be detected.
peak_distance : int, optional, default is 5
Minimal distance between two peaks so that they were separated.
Returns
-------
Nothing
The output are the files and their characteristics on the screen.
Technical note
--------------
* This function uses Datafiles.read and than Arrays.functions.
'''
# Initialization
file_counter = 0
# Iterate through the files
for datafile in SDATA.filenames:
# Read datafile from disk to array
arr = Datafiles.read(SDATA, datafile)
# Print datafile name
datafile_name = datafile.relative_to(SDATA.data_dir)
print('Datafile:', datafile_name)
# Describe the datafile/array
Arrays.describe(arr,
csquare, cintensity, peak_height, peak_distance)
# Show the datafile/array
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
# Decide if we should stop the show
if interactive:
# Wait for keyboard input...
choice = str(input('[Enter] to show next, [q] to quit...\n'))
# Break if 'q' was pressed and continue otherwise...
if choice == 'q': break
elif max_files:
file_counter += 1
if file_counter >= max_files: break
def show_from_database(SDATA, df,
interactive=True, max_files=None,
icut=1000, itype='8bit', cmap='gray'):
'''
Show datafiles (pre-selected in a database) from 2D-STEM detector.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be shown.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
interactive: bool, optional, the defailt is True
If True, images are shown interactively,
i.e. any key = show next image, 'q' = quit.
max_files: integer, optional, the default is None
If not(interactive==True) and max_files > 0,
show files non-interactively = in one run, until
number of files is less than max_files limit.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is '8bit'
Type of the image - 8 or 16 bit grayscale.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
Returns
-------
Nothing
The output are the files and their characteristics on the screen.
Technical notes
---------------
* This function uses Datafiles.read function
to read data from database.
* As the function uses database data,
it cannot use standard Arrays functions.
'''
# Initialize file counter
file_counter = 0
# Show the files and their characteristics saved in the database
for index,datafile in df.iterrows():
# Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = Datafiles.read(SDATA, datafile_name)
# Print datafile characteristics from the database
print(f'Datafile: {datafile.DatafileName}')
print(f'Center (x,y): \
({datafile.Xcenter:.1f},{datafile.Ycenter:.1f})')
print(f'Maximum intensity: {datafile.MaxInt}')
print(f'Number of peaks: {datafile.Peaks}')
print(f'Shannon entropy: {datafile.S}')
# Show datafile (and draw XY-center from the database data)
arr = np.where(arr>icut, icut, arr)
plt.imshow(arr, cmap=cmap)
# Draw center
# (we read data from database
# (files are shown without any rescaling
# (but database contains Xcenter,Ycenter from upscaled images
# (=> we have to divide Xcenter,Ycenter by rescale coefficient!
R = SDATA.detector.upscale
plt.plot(
datafile.Ycenter/R,
datafile.Xcenter/R,
'r+', markersize=20)
plt.show()
# Increase file counter & stop if max_files limit was reached
file_counter += 1
if file_counter >= max_files: break
class Arrays:
'''
Arrays class = a collection of functions
that work with 2D-arrays, which represent datafiles from 2D-STEM detector.
'''
def show(arr,
icut=None, itype=None, R=None, cmap=None,
center=False, csquare=20, cintensity=0.8,
plt_type='2D', plt_size=None, colorbar=False):
'''
Show 2D-array as an image.
Parameters
----------
arr : 2D numpy array
Array to show.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, None or '8bit' or '16bit'
Type of the image - 8-bit or 16-bit.
If itype equals None or '16-bit' the image is treated as 16-bit.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is None
Matplotlib colormap for plotting of the array.
Interesting or high-contrast options:
'gray', 'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
plt_type : str, '2D' or '3D', optional, default is '2D'
Type of the plot: 2D-dimensional or 3D-dimensional/surface plot.
plt_size : int, optional, default is not
If given, we plot only the central region with size = *plt_size*.
For central region we use a geometric center - see Technical notes.
colorbar : bool, optional, the default is False
If True, a colorbar is added to the plot.
Returns
-------
Nothing
The output is the array shown as an image on the screen.
Technical notes
---------------
* In this function, we *do not* center the image/array.
Center can be drawn to 2D-image, but array *is not centered*.
* Edges can be removed (using plt_size argument),
but only only with respect to the geometrical center,
which means that the function shows a *non-cenered central region*.
* If you need to show *centered central region* of an array,
combine Arrays.find_center + Arrays.remove_edges + Arrays.show
'''
# Prepare array for saving
arr = Arrays.prepare_for_show_or_save(arr, icut, itype, R)
# Remove edges of the plot, if requested
# (just simple removal of edges based on geometrical center!
# (reason: simplicity; for centering/edge removal we have other funcs
if plt_size:
Xsize,Ysize = arr.shape
xc,yc = (int(Xsize/2),int(Ysize/2))
if plt_size:
arr = Arrays.remove_edges(arr,plt_size,xc,yc)
# Plot array as image
# (a) Prepare 2D plot (default)
if plt_type=='2D':
if cmap==None: # if cmap not selected, set default for 2D maps
cmap='viridis'
plt.imshow(arr, cmap=cmap)
if colorbar: # Add colorbar
plt.colorbar()
if center==True: # Mark intensity center in the plot
xc,yc = Arrays.find_center(arr,csquare, cintensity)
plt.plot(yc,xc, 'r+', markersize=20) # switch xc,yc for img!
# (b) Prepare 3D plot (option; if plt_type is not the default '2D')
else:
if cmap==None: # if cmap not selected, set default for 3D maps
cmap='coolwarm'
# Prepare meshgrid for 3D-plotting
Xsize,Ysize = arr.shape
X = np.arange(Xsize)
Y = np.arange(Ysize)
Xm,Ym = np.meshgrid(X,Y)
# Create 3D-plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(
Xm,Ym,arr, cmap=cmap, linewidth=0, antialiased=False)
plt.tight_layout()
# (c) Show the plot
plt.show()
def describe(arr,
csquare=20, cintensity=0.8,
peak_height=100, peak_distance=5):
'''
Describe 2D-array = print XY-center, MaxIntensity, Peaks, Sh-entropy.
Parameters
----------
arr : 2D numpy array
Array to describe.
csquare : integer, optional, default is None
Edge of a central square, from which the center will be determined.
cintensity : float in interval 0--1, optional, default is None
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
peak_height : int, optional, default is 100
Minimal height of the peak to be detected.
peak_distance : int, optional, default is 5
Minimal distance between two peaks so that they were separated.
Returns
-------
Nothing
The array characteristics are just printed.
Technical note
--------------
This function is just a wrapper
around several np.functions and Arrays.functions.
To get the values, use the individual functions instead.
'''
# Determine center (of intensity)
x,y = Arrays.find_center(arr, csquare, cintensity)
print(f'Center (x,y): ({x:.1f},{y:.1f})')
# Determine maximum intensity
max_intensity = np.max(arr)
print(f'Maximum intensity = {max_intensity:d}')
# Estimate number of peaks (local maxima)
no_of_maxima = Arrays.number_of_peaks(arr, peak_height, peak_distance)
print(f'Number of peaks = {no_of_maxima}')
# Calculate Shannon entropy of the datafile
entropy_value = measure.shannon_entropy(arr)
print(f'Shannon entropy = {entropy_value:.2f}')
def find_center(arr, csquare=None, cintensity=None):
'''
Determine center of mass for 2D numpy array.
Array center = mass/intensity center ~ position of central spot.
Warning: In most cases, geometric center is NOT mass/intensity center.
Parameters
----------
arr : numpy 2D array
Numpy 2D array, whose center (of mass ~ intensity) we want to get.
csquare : integer, optional, default is None
Edge of a central square, from which the center will be determined.
cintensity : float in interval 0--1, optional, default is None
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Returns
-------
xc,yc : integers
Coordinates of the intesity center = position of the primary beam.
'''
# Calculate center of array
if csquare:
# If csquare was given,
# calculate center only for the square in the center,
# in which we set background intensity = 0 to get correct results.
# a) Calculate array corresponding to central square
xsize,ysize = arr.shape
xborder = (xsize - csquare) // 2
yborder = (ysize - csquare) // 2
arr2 = arr[xborder:-xborder,yborder:-yborder].copy()
# b) Set intensity lower than maximum*coeff to 0 (background removal)
coeff = cintensity or 0.8
arr2 = np.where(arr2>np.max(arr2)*coeff, arr2, 0)
# c) Calculate center of intensity (and add borders at the end)
M = measure.moments(arr2,1)
(xc,yc) = (M[1,0]/M[0,0], M[0,1]/M[0,0])
(xc,yc) = (xc+xborder,yc+yborder)
(xc,yc) = np.round([xc,yc],2)
else:
# If csquare was not given,
# calculate center for the whole array.
# => Wrong position of central spot for non-centrosymmetric images!
M = measure.moments(arr,1)
(xc,yc) = (M[1,0]/M[0,0], M[0,1]/M[0,0])
(xc,yc) = np.round([xc,yc],2)
# Return final values
# IMPORTANT: the values works for arrays => switch xc,yc for images!
return(xc,yc)
def number_of_peaks(arr, peak_height,
peak_distance=None, neighborhood_matrix=None):
'''
Estimate number of peaks in given array.
Parameters
----------
arr : 2D numpy array
Array, for which we want to determine the number of peaks.
peak_height : float
Height of peak with respect to background; obligatory argument.
peak_distance : int, optional, default is None
Distance between two neighboring peaks.
If given, distance matrix is calculated out of it.
neighborhood_matrix : numpy 2D-array, optional, default is None
The neighborhood expressed as a 2D-array of 1's and 0's.
The neighborhood matrix can be eigher given directly
(using argument neighborhood_matrix) or indirectly
(calculated from argument peak_distance).
Returns
-------
no_of_peaks : int
Estimated number of peaks (local maxima) in given array.
'''
# If peak distance was given, calculate distance matrix from it.
if peak_distance:
neighborhood_matrix = np.ones(
(peak_distance,peak_distance), dtype=np.uint8)
# Determine number of peaks
no_of_peaks = np.sum(morphology.h_maxima(
arr, h=peak_height, footprint=neighborhood_matrix))
# Return number of peaks
return(no_of_peaks)
def rescale(arr, R, order=None):
'''
Rescale 2D numpy array (which represents an image).
Parameters
----------
arr : 2D numpy array
Numpy array representing DAT-file/image.
R : integer
Rescale parameter: new_size_of the array = original_size * R
Returns
-------
arr : 2D numpy array
The array has `new_size = original_size * R`.
'''
# Keep original value of array maximum.
arr_max = np.max(arr)
# Rescale the array.
arr = transform.rescale(arr, R, order)
# Restore the original value of array maximum.
arr = arr/np.max(arr) * arr_max
# Return the rescaled array.
return(arr)
def remove_edges(arr,rsize,xc,yc):
'''
Cut array to rsize by removing edges; center of new array = (xc,yc).
Parameters
----------
arr : numpy 2D array
The original array, whose size should be reduced.
rsize : integer
The size of reduced array.
xc,yc : integers
The center of original array;
the reduced array is cut to rsize, center of new array is in xc,yc.
Returns
-------
arr : 2D numpy array
The array with reduced size.
'''
halfsize = int(rsize/2)
if (rsize % 2) == 0:
arr = arr[xc-halfsize:xc+halfsize, yc-halfsize:yc+halfsize]
else:
arr = arr[xc-halfsize:xc+halfsize+1, yc-halfsize:yc+halfsize+1]
return(arr)
def save_as_image(arr, output_image, icut=None, itype='8bit', R=None):
'''
Save 2D numpy array as grayscale image.
Parameters
----------
arr : 2D numpy array
Array or image object to save.
output_image : string or pathlib object
Name of the output/saved file.
icut : integer
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype: string ('8bit' or '16bit')
Type of the image: 8 or 16 bit grayscale.
R: integer
Rescale coefficient;
the input array is rescaled/enlarged R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be saved with R = 2 (or 4)
in order to get sufficiently large image for further processing.
Returns
-------
Nothing
The output is *arr* saved as *output_image* on a disk.
'''
# Prepare array for saving
arr = Arrays.prepare_for_show_or_save(arr, icut, itype, R)
# Prepare image object (8bit or 16bit)
if itype == '8bit':
img = Image.fromarray(arr, 'L')
else:
img = Image.fromarray(arr)
# Save image
img.save(output_image)
def save_as_datafile(SDATA, arr, filename):
'''
Save array as a datafile (in current detector format).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
arr : 2D numpy array
Array or image object to save.
filename : str or path
Name of the file to save.
Returns
-------
None
The output is *arr* saved as a datafile named *filename*.
The format of the saved datafile corresponds to current detector.
Technical notes
---------------
* This function io.Arrays.save_as_datafile
is just a wrapper that calls *save_datafile* function
for given detector.
* The detector object is accessible in this function
thanks to SDATA argument as *SDATA.detector*.
'''
SDATA.detector.save_datafile(arr, filename)
def prepare_for_show_or_save(arr, icut=None, itype=None, R=None):
'''
Prepare 2D numpy array (which contains a 2D-STEM datafile)
for showing/saving as grayscale image.
Parameters
----------
arr : 2D numpy array
Array or image object to save.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype: string, optional, '8bit' or '16bit', default is None.
Type of the image - 8-bit or 16-bit grayscale.
If none, then the image is saved as 16-bit.
R: integer, optional, default is None
Rescale coefficient;
the input array is rescaled/enlarged R-times.
For smaller 2D-STEM detectors with size 256x256 pixels,
the array should be saved with R = 2 (or 4)
in order to get sufficiently large image for further processing.
Returns
-------
arr : 2D numpy array
The modified array ready for showing or saving on a disk.
'''
# Cut intensity
if icut:
arr = np.where(arr>icut, icut, arr)
# Rescale
if R:
arr_max = np.max(arr)
arr = transform.rescale(arr, R)
arr = arr/np.max(arr) * arr_max
# Prepare for showing/saving as 8bit or 16 bit
if itype == '8bit':
arr = np.round(arr * (255/np.max(arr))).astype(dtype=np.uint8)
else:
arr = arr.astype('uint16')
# Return the modified array
return(arr)
class Images:
'''
Images class = a collection of functions
that work with images representing datafiles from 2D-STEM detector.
'''
def read(image_name, itype='8bit'):
'''
Read grayscale image into 2D numpy array.
Parameters
----------
image_name : string or pathlib object
Name of image that should read into numpy 2D array.
itype: string ('8bit' or '16bit')
type of the image: 8 or 16 bit grayscale
Returns
-------
arr : 2D numpy array
The array converted from *image_name*.
'''
img = Image.open(image_name)
if itype=='8bit':
arr = np.asarray(img, dtype=np.uint8)
else:
arr = np.asarray(img, dtype=np.uint16)
return(arr)
def show(image_name,
icut=None, itype='8bit', R=None, cmap='gray',
center=False, csquare=20, cintensity=0.8):
'''
Read and display image from disk.
Parameters
----------
image_name : str or path-like object
Name of image to read.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300.
itype : string, optional, '8bit' or '16bit', default is '8bit'
Type of the image - 8 or 16 bit grayscale.
R : integer, optional, default is None
Rescale coefficient;
the input array is rescaled (usually upscaled) R-times.
For typical 2D-STEM detector with size 256x256 pixels,
the array should be processed with R=4
in order to get sufficiently large image for further processing.
cmap : str - matplotlib.colormap name, optional, the default is 'gray'
Matplotlib colormap for plotting of the array.
Other interesting or high-contrast options:
'viridis', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
center : bool, optional, default is False
If True, intensity center is drawn in the final image.
csquare : integer, optional, default is 20
Edge of a central square, from which the center will be determined.
Ignored if center == False.
cintensity : float in interval 0--1, optional, default is 0.8
The intensity < maximum_intensity * cintensity is regarded as 0
(a simple temporary background removal in the central square).
Ignored if center == False.
Returns
-------
Nothing
The output is *image_name* shown on the screen.
'''
# Read Image to array.
arr = Images.read(image_name, itype=itype)
# Show the array using pre-defined stemdiff.io.Array.show function.
Arrays.show(arr,
icut, itype, R, cmap,
center, csquare, cintensity)
class Plots:
'''
Plots class = a collection of functions
for easy plotting of diffractograms and/or graphs.
'''
# The functions in this sub-package are not TOO general.
# This is intentional - writing too general plotting funcs makes no sense.
# The exception is the 1st func, which sets plot parameters for any plot.
def set_plot_parameters(size=(12,9), dpi=75,
fontsize=10, my_rcParams=None):
'''
Set global plot parameters (mostly for plotting in Jupyter).
Parameters
----------
size : tuple of two floats, optional, the default is (12,9)
Size of the figure (width, height) in [cm].
dpi : int, optional, the defalut is 75
DPI of the figure.
fontsize : int, optional, the default is 10
Size of the font used in figure labels etc.
my_rcParams : dict, optional, default is None
Dictionary in plt.rcParams format
containing any other allowed global plot parameters.
Returns
-------
None; the result is a modification of the global plt.rcParams variable.
'''
# We test all arguments, if they exist.
# (Theoretically, user could have redefined them as None
# (In such a case we would change nothing and leave default values
if size: # Figure size
# Convert size in [cm] to required size in [inch]
size = (size[0]/2.54, size[1]/2.54)
plt.rcParams.update({'figure.figsize' : size})
if dpi: # Figure dpi
plt.rcParams.update({'figure.dpi' : dpi})
if fontsize: # Global font size
plt.rcParams.update({'font.size' : fontsize})
if my_rcParams: # Other possible rcParams in the form of dictionary
plt.rcParams.update(my_rcParams)
def plot_2d_diffractograms(data_to_plot,
icut=None, cmap='viridis',
output_file=None, dpi=300):
'''
Plot a few selected 2D diffraction patterns in a row, one-by-one.
Parameters
----------
data_to_plot : list of lists
This object is a list of lists.
The number of rows = the number of plotted diffractograms.
Each row contains two elements:
(i) data for diffractogram to plot and (ii) title of the plot.
The data (first element of each row) can be:
(i) PNG-file or (ii) 2D-array containing the 2D diffractogram.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300
(this is used just for plotting; the array values are not changed).
cmap : matplotlib.colormap name, optional, the default is 'viridis'
Matplotlib colormap for plotting of the diffractogram.
Other interesting or high-contrast options:
'gray', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
output_file : str, optional, default is None
If this argument is given,
the plot is also saved in *output_file* image
with *dpi* resolution (dpi is specified by the following argument).
dpi : int, optional, default is 300
The (optional) argument gives resolution of
(the optional) output image.
Returns
-------
None
The output is the plot of the diffraction patterns on the screen.
If argument *ouput_file* is given, the plot is saved as an image.
'''
# Initialize
n = len(data_to_plot)
diffs = data_to_plot
fig,ax = plt.subplots(nrows=1, ncols=n)
# Plot 2D-diffractograms
for i in range(n):
# Read data to plot
data = diffs[i][0]
# Test input data...
if type(data) == str: # ...String - we suppose an image
if data.lower().endswith('.png'): # PNG file, 2D-diffractogram
# we read image as '16bit'
# in this case, it works for 8bit images as well
arr = Images.read(data, itype='16bit')
else: # Other than PNG files are not supported now
print(f'Unsuported format of file {data}!')
sys.exit()
elif type(data) == np.ndarray: # Numpy array
if data.shape[0] == data.shape[1]: # square array, 2D-diff.pat
arr = data
else: # Non-square arrays are not supported now
print('Non-square arrays to plot - not supported.')
sys.exit()
# Read plot parameters
my_title = diffs[i][1]
# Plot i-th datafile/array
ax[i].imshow(arr, vmax=icut, cmap=cmap)
ax[i].set_title(my_title)
# Finalize plot
for i in range(n): ax[i].axis('off')
fig.tight_layout()
if output_file: fig.savefig(output_file, dpi=dpi)
def plot_datafiles_with_NS(SDATA, df, N=None, S=None,
icut=None, rsize=None, cmap='viridis',
output_file=None, dpi=300):
'''
Plot datafiles with selected (N,S)=(NoOfPeaks,Entropy) in a row.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be shown.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
N : list of integers, arbitrary lenght, but should it not be empty
List of N values, where N = estimated NumberOfPeaks in a datafile.
This function will plot datafiles for given combination of (N,S).
The sister argument S is described below.
The (N,S) lists must have the same length.
S : list of integers, arbitrary lenght, but should it not be empty
List of S values; S = calculated ShannonEntropy of a datafile.
This function will plot datafiles for given combination of (N,S).
The sister argument N is described above.
The (N,S) lists must have the same length.
icut : integer, optional, default is None
Cut of intensity;
if icut = 300, all image intensities > 300 will be equal to 300
(this is used just for plotting; the array values are not changed).
rsize : int, optional, default is None
Reduced size of an image.
If rsize = 100, size of all datafiles/images is reduced
so that only the central square with size = rsize is plotted.
The center coordinates for given datafile
are taken from the (obligatory) df argument (see above).
cmap : matplotlib.colormap name, optional, the default is 'viridis'
Matplotlib colormap for plotting of the diffractogram.
Other interesting or high-contrast options:
'gray', 'plasma', 'magma', ...
The full list of matplotlib colormaps:
`matplotlib.pyplot.colormaps()`
output_file : str, optional, default is None
If this argument is given,
the plot is also saved in *output_file* image
with *dpi* resolution (dpi is specified by the following argument).
dpi : int, optional, default is 300
The (optional) argument gives resolution of
(the optional) output image.
Returns
-------
None
The output are the datafiles plotted on the screen.
If *ouput_file* is given, the plot is also saved as an image.
Technical note
--------------
The function has no return statement => it returns None.
In Jupyter and Spyder, the plot/figure is drawn on the screen.
In CLI Python, the figure should be saved => argument *output_file*.
'''
# STEP 1: Prepare multiplot.
fig,ax = plt.subplots(nrows=1, ncols=len(N))
# STEP 2: Find datafiles with selected N,S combinations,
# and insert them in the prepared multiplot one-by-one.
#
# The finding of the datafiles:
# In given database (argument df),
# we localize datafiles with selected combination of (N,S)
# a) N = Peaks = NumberOfPeaks
# b) S = index of file sorted according to Shannon entropy
# => S = 1 : the file with given N and the highest S
# => S = 2,3... : like previous, but the 2nd, 3rd highest S ...
# => S = -1 : the file with given N and the lowest S
# => S = -2,-3... : like previous, but the 2nd,3rd lowest S ...
# Go through (N,S) pairs,
# find datafiles and insert them in the muptiplot.
# * We combine enumerate and zip
# (https://stackoverflow.com/q/16326853
# reason for enumerate => index i = plot/axes number = index in ax[i]
# reason for zip => we need (N,S) pairs = two values together
for i,(n,s) in enumerate(zip(N,S)):
# (1) Create sub-database, which ...
# a) contains just entries for given n = N = Peaks = NoOfPeaks
# b) is sorted according to S = ShannonEtropy descendingly
dfN = df[df.Peaks == n].sort_values(
by='S', ascending=False, ignore_index=True)
# (2) Modify value of s according to logic ...
if s >= 1:
# If s >= 1, decrease its value
# Reason: Python indexing start from 0, we start from 1
s = s - 1
elif s > len(dfN):
# If s > lenght_of_database_with_given_N, set it to maximum
# Reason: if s = 10 and len(dfN) is just 8 => s = 8
s = len(dfN)
elif s < - len(dfN):
# If s < -lenght_of_database_with_given_N, set it to maximum
# Reason: if s = -10 and len(dfN) is just 8 => s = -8
s = -len(dfN)
# (3) Get datafile name
# (the datafile name is extracted from df by means of iloc
# (the 1st index = 1 => col-index: col 1 in df/dfN = datafile names
# (the 2nd index = s => row-index: dfN was sorted according to 'S'
datafile = dfN.iloc[s,0]
# (4) Read the datafile to an array
arr = Datafiles.read(SDATA, datafile)
# (5) Reduce size if requested
# (remove edges and keep just central square with
if rsize:
# Find datafile center coordinates (xc,yc)
# (xc,yc are in the database in columns 2 and 3, respectively
R = SDATA.detector.upscale
xc = int(np.round(dfN.iloc[s,1]/R,0))
yc = int(np.round(dfN.iloc[s,2]/R,0))
# Keep just central square with size = rsize
arr = Arrays.remove_edges(arr, rsize, xc, yc)
# (6) Draw datafile in the plot
# (a) Get parameters for plot title
# (the plot title should be something like N=10,S=1,M=1200
# (where N,S,M are NoOfPeaks, ShannonEntropy and MaxIntensity
max_intensity = dfN.iloc[s,3]
peaks = dfN.iloc[s,4]
entropy = dfN.iloc[s,5]
# (b) Prepare title
plot_title = \
f'N={peaks:d} S={entropy:.2f} M={int(max_intensity):d}'
# (c) Create the plot
ax[i].imshow(arr, vmax=icut, cmap=cmap)
ax[i].set_title(plot_title)
# STEP 3: Finalize the figure (and save it, if requested)
# Finalize figure
for i in range(len(N)): ax[i].axis('off')
fig.tight_layout()
# Save if requested
if output_file:
fig.savefig(output_file, dpi=dpi, facecolor='white')
# IMPORTANT TECHNICAL NOTE: Output of this function
# The function has no return statement => it returns None.
# In Jupyter and Spyder, the plot/figure is drawn on the screen.
# This is a property of figs - last command draws them in iPython.
# Here the last command is: fig.tight_layout() => figure is drawn.
# In classical/CLI Python, the fig should be saved => output_file arg.
'''
Module: stemdiff.sum
--------------------
The summation of 4D-STEM datafiles to create one 2D powder diffraction file.
* stemdiff.sum = this module, which runs on a single core (serial processing)
* stemdiff.summ = sister module running on multiple cores (parallel processing)
To perform the summation, we just call function sum_datafiles:
* serial : stemdiff.sum.sum_datafiles(SDATA, DIFFIMAGES, df, deconv, ...)
* parallel : stemdiff.summ.sum_datafiles(SDATA, DIFFIMAGES, df, deconv, ...)
The initial arguments are:
* SDATA = stemdiff.gvars.SourceData object = description of source data
* DIFFIMAGES = stemdiff.gvars.DiffImages object = description of diffractograms
* df = pre-calculated database of datafiles/diffratograms to sum
Key argument is deconv, which determines the processing type:
* deconv=0 = sum *without* deconvolution
* deconv=1 = R-L deconvolution with global PSF from low-diffraction datafiles
* deconv=2 = subtract background + R-L deconvolution with PSF from the center
'''
import numpy as np
import stemdiff.io
import stemdiff.dbase
import idiff
from skimage import restoration
import tqdm
import sys
def sum_datafiles(SDATA, DIFFIMAGES, df, deconv=0, psf=None, iterate=10):
"""
Sum datafiles from a 4D-STEM dataset to get 2D powder diffractogram.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Pre-calculated atabase with datafiles to be summed.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
deconv : int, optional, default is 0
Deconvolution type:
0 = no deconvolution,
1 = deconvolution based on external PSF,
2 = deconvolution based on PSF from central region,
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered,
we get the sum of filtered datafiles.
Additional arguments determin the (optional) type of deconvolution.
Technical notes
---------------
* This function works as a signpost.
* It reads the summation parameters and calls a more specific summation
functions (which aren NOT called directly by the end-user).
* It employs progress bar, handles possible exceptions,
and returns the final array (= post-processed and normalized array).
"""
# (1) Prepare variables for summation
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
datafiles = [datafile[1] for datafile in df.iterrows()]
sum_arr = np.zeros((img_size * R, img_size * R), dtype=np.float32)
# (2) Prepare variables for tqdm
# (to create a single progress bar for the entire process
total_tasks = len(datafiles)
sys.stderr = sys.stdout
# (3) Run summations
# (summations will run with tqdm
# (we will use several types of summations
# (each summations uses datafiles prepared in a different way
with tqdm.tqdm(total=total_tasks, desc="Processing ") as pbar:
try:
# Process each image in the database
for index, datafile in df.iterrows():
# Deconv0 => sum datafiles without deconvolution
if deconv == 0:
sum_arr += dfile_without_deconvolution(
SDATA, DIFFIMAGES, datafile)
# Deconv1 => sum datafiles with DeconvType1
elif deconv == 1:
sum_arr += dfile_with_deconvolution_type1(
SDATA, DIFFIMAGES, datafile, psf, iterate)
# Deconv2 => sum datafiles with DeconvType2
elif deconv == 2:
sum_arr += dfile_with_deconvolution_type2(
SDATA, DIFFIMAGES, datafile, iterate)
# Update the progress bar for each processed image
pbar.update(1)
except Exception as e:
print(f"Error processing a task: {str(e)}")
# (4) Move to the next line after the progress bar is complete
print('')
# (5) Post-process the summation and return the result
return sum_postprocess(sum_arr, len(df))
def sum_postprocess(sum_of_arrays, n):
"""
Normalize and convert the summed array to 16-bit unsigned integers.
Parameters
----------
sum_of_arrays : np.array
Sum of the arrays -
usually from stemdiff.sum.sum_datafiles function.
n : int
Number of summed arrays -
usually from stemdiff.sum.sum_datafiles function.
Returns
-------
arr : np.array
Array representing final summation.
The array is normalized and converted to unsigned 16bit integers.
"""
arr = np.round(sum_of_arrays/n).astype(np.uint16)
return(arr)
def dfile_without_deconvolution(SDATA, DIFFIMAGES, datafile):
"""
Prepare datafile for summation without deconvolution (deconv=0).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType0 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType0 = no deconvolution,
just summation of the prepared datafiles (upscaled, centered...).
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter),round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr,img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Return the datafile as an array that is ready for summation
return(arr)
def dfile_with_deconvolution_type1(SDATA, DIFFIMAGES, datafile, psf, iterate):
"""
Prepare datafile for summation with deconvolution type1 (deconv=1).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
psf : 2D-numpy array
Array representing the 2D-PSF function.
iterate : int
Number of iterations during the deconvolution.
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType1 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType1 = Richardson-Lucy deconvolution using PSFtype1.
* PSFtype1 = 2D-PSF estimated from files with negligible diffractions.
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter), round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr, img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Deconvolution: Richardson-Lucy using a global PSF
# (a) save np.max, normalize
# (reason: deconvolution algorithm requires normalized arrays...
# (...and we save original max.intensity to re-normalize the result
norm_const = np.max(arr)
arr_norm = arr/np.max(arr)
psf_norm = psf/np.max(psf)
# (b) perform the deconvolution
arr_deconv = restoration.richardson_lucy(
arr_norm, psf_norm, num_iter=iterate)
# (c) restore original range of intensities = re-normalize
arr = arr_deconv * norm_const
# (4) Return the deconvolved datafile
# as an array that is ready for summation
return arr
def dfile_with_deconvolution_type2(SDATA, DIFFIMAGES, datafile, iterate):
"""
Prepare datafile for summation with deconvolution type2 (deconv=2).
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describing source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
The bject describing the diffraction images/patterns.
datafile : one row from the prepared database of datafiles
The database of datafiles is created
in stemdiff.dbase.calc_database function.
Each row of the database contains
[filename, xc, yc, MaxInt, NumPeaks, S].
iterate : int
Number of iterations during the deconvolution.
Returns
-------
arr : 2D numpy array
The datafile in the form of the array,
which is ready for summation (with DeconvType1 => see Notes below).
Notes
-----
* The parameters are transferred from the `sum_datafiles` function.
* DeconvType2 = Richardson-Lucy deconvolution
using PSFtype2 + simple background subtraction.
* PSFtype2 = 2D-PSF estimated from central region of the datafile
AFTER background subtraction.
"""
# (0) Prepare variables
R = SDATA.detector.upscale
img_size = DIFFIMAGES.imgsize
psf_size = DIFFIMAGES.psfsize
# (1) Read datafile
datafile_name = SDATA.data_dir.joinpath(datafile.DatafileName)
arr = stemdiff.io.Datafiles.read(SDATA, datafile_name)
# (2) Rescale/upscale datafile and THEN remove border region
# (a) upscale datafile
arr = stemdiff.io.Arrays.rescale(arr, R, order=3)
# (b) get the accurate center of the upscaled datafile
# (the center coordinates for each datafile are saved in the database
# (note: our datafile is one row from the database => we know the coords!
xc,yc = (round(datafile.Xcenter),round(datafile.Ycenter))
# (c) finally, the borders can be removed with respect to the center
arr = stemdiff.io.Arrays.remove_edges(arr,img_size*R,xc,yc)
# (Important technical notes:
# (* This 3-step procedure is necessary to center the images precisely.
# ( The accurate centers from upscaled images are saved in database.
# ( The centers from original/non-upscaled datafiles => wrong results.
# (* Some border region should ALWAYS be cut, for two reasons:
# ( (i) weak/zero diffractions at edges and (ii) detector edge artifacts
# (3) Remove background
arr = idiff.bcorr.rolling_ball(arr, radius=20)
# (4) Prepare PSF from the center of given array
# (recommended parameters:
# (psf_size => to be specified in the calling script ~ 30
# (circular => always True - square PSF causes certain artifacts
psf = idiff.psf.PSFtype2.get_psf(arr, psf_size, circular=True)
# (5) Deconvolution
# (a) save np.max, normalize
# (reason: deconvolution algorithm requires normalized arrays...
# (...and we save original max.intensity to re-normalize the result
norm_const = np.max(arr)
arr_norm = arr/np.max(arr)
psf_norm = psf/np.max(psf)
# (b) perform the deconvolution
arr_deconv = restoration.richardson_lucy(
arr_norm, psf_norm, num_iter=iterate)
# (c) restore original range of intensities = re-normalize
arr = arr_deconv * norm_const
# (6) Return the deconvolved datafile
# as an array that is ready for summation
return(arr)
'''
Module: stemdiff.summ
---------------------
The summation of 4D-STEM datafiles to create one 2D powder diffraction file.
* The summation runs on all available cores (parallel processing).
* This module takes functions from semdiff.sum, but runs them parallelly.
The key function of the module (for a user) = stemdiff.summ.sum_datafiles:
* The function takes the same arguments as stemdiff.sum.sum_datafiles.
* The only difference consists in that the summation runs on multiple cores.
How it works:
* This module contains just two functions:
- `summ.sum_datafiles` - wrapper for the next function
- `summ.multicore_sum` - runs the summation on multiple cores
* The rest is done with the functions of sister module *stemdiff.sum*.
- i.e. the summ.multicore_sum calls functions from stemdiff.sum
- but the functions run within this module, using multiple cores
* Summary:
- `sum.sum_datafiles` - runs on a single core (docs in stemdiff.sum)
- `summ.sum_datafiles` - runs on multiple cores, aguments are identical
'''
import os
import sys
import tqdm
import stemdiff.sum
import concurrent.futures as future
def sum_datafiles(
SDATA, DIFFIMAGES,
df, deconv=0, psf=None, iterate=10):
'''
Sum datafiles from a 4D-STEM dataset to get 2D powder diffractogram.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Database with datafile names and characteristics.
deconv : int, optional, default is 0
Deconvolution type:
0 = no deconvolution,
1 = deconvolution based on external PSF,
2 = deconvolution based on PSF from central region,
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered, we get the sum of filtered datafiles,
if PSF is given, we get the sum of datafiles with PSF deconvolution.
Technical notes
---------------
* This function is a wrapper.
* It calls stemdiff.summ.multicore_sum with correct arguments:
- all relevant original arguments
- one additional argument: the *function for summation*
- the *function for summation* depends on the deconvolution type
'''
if deconv == 0:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_without_deconvolution)
elif deconv == 1:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_with_deconvolution_type1)
elif deconv == 2:
arr = multicore_sum(
SDATA, DIFFIMAGES, df, psf, iterate,
func = stemdiff.sum.dfile_with_deconvolution_type2)
else:
print(f'Unknown deconvolution type: deconv={deconv}')
print('Nothing to do.')
return None
return arr
def multicore_sum(SDATA, DIFFIMAGES, df, psf, iterate, func):
'''
Execute concurrent data processing using a thread pool.
This function processes a list of datafiles using a thread pool
for parallel execution. The number of concurrent workers is determined
by subtracting 1 from the available CPU cores.
Parameters
----------
SDATA : stemdiff.gvars.SourceData object
The object describes source data (detector, data_dir, filenames).
DIFFIMAGES : stemdiff.gvars.DiffImages object
Object describing the diffraction images/patterns.
df : pandas.DataFrame object
Database with datafile names and characteristics.
psf : 2D-numpy array or None, optional, default is None
Array representing 2D-PSF function.
Relevant only for deconv = 1.
iterate : integer, optional, default is 10
Number of iterations during the deconvolution.
func : a function from stemdiff.sum module to be used for summation
A function from sister module stemdiff.sum,
which will be used for summation on multiple cores.
This argument is (almost always) passed from the calling function
stemdiff.summ.sum_datafiles so that it corresponded to
the user-selected deconvolution type.
Returns
-------
final_arr : 2D numpy array
The array is a sum of datafiles;
if the datafiles are pre-filtered, we get sum of filtered datafiles,
if PSF is given, we get sum of datafiles with PSF deconvolution.
Technical notes
---------------
* This function is NOT to be called directly.
* It is called by wrapper function stemdiff.summ.sum_datafiles.
* The two functions work as follows:
- calling function = stemdiff.summ.sum_datafiles
- passes all relevant arguments including function for summation
- this function = stemdiff.summ.multicore_sum
- runs the summation on multiple cores and returns the result
'''
# (0) Initialize
num_workers = os.cpu_count() # Number of concurrent workers
datafiles = [datafile[1] for datafile in df.iterrows()]
# (1) Use ThreadPool to perform multicore summation
with future.ThreadPoolExecutor(max_workers=num_workers) as executor:
# (a) Prepare variables
futures = []
total_tasks = len(datafiles)
# (b) Submit tasks to the executor
for i, file in enumerate(datafiles):
try:
if func == stemdiff.sum.dfile_without_deconvolution:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file)
elif func == stemdiff.sum.dfile_with_deconvolution_type1:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file, psf, iterate)
elif func == stemdiff.sum.dfile_with_deconvolution_type2:
future_obj = executor.submit(
func, SDATA, DIFFIMAGES, file, iterate)
else:
raise Exception("Uknown deconvolution function!")
futures.append(future_obj)
except Exception as e:
print(f"Error processing file {file}: {str(e)}")
# (c) Use tqdm to create a progress bar
stderr_original = sys.stderr
sys.stderr = sys.stdout
with tqdm.tqdm(total=total_tasks,
desc="Processing ") as pbar:
# ...wait for all tasks to complete
for future_obj in future.as_completed(futures):
try:
future_obj.result()
except Exception as e:
print(f"Error processing a task: {str(e)}")
pbar.update(1)
sys.stderr = stderr_original
# (2) Summation done, collect the results
# (a) Print a new line to complete the progress bar
print()
# (b) Collect results
deconvolved_data = [f.result() for f in futures]
# (3) Results collected, perform post-processing
# (a) Sum results = the processed/deconvolved files from previous steps
sum_arr = sum(deconvolved_data)
# (b) Run post-processing routine = normalization,
final_arr = stemdiff.sum.sum_postprocess(sum_arr,len(deconvolved_data))
# (4) Return final array = sum of datafiles with (optional) deconvolution
return(final_arr)