Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Readme
Python
: ~=3.10pytest
: ~=7.4.3setuptools
: ~=68.2.2pandas
: ~=2.2.0This project mandates the use of
Python 3.7
or later versions. Compatibility issues have been identified with the use for dataclasses inPython 3.6
and earlier versions.
pip install -r requirements.txt
pip install dynamic-loader
pip install -r requirements.txt
The DataLoader project is a comprehensive utility that facilitates the efficient loading and processing of data from specified directories. This project is designed to be user-friendly and easy to integrate into your projects.
The DataMetrics class focuses on processing data paths and gathering statistics related to the file system and specified paths. Also allows the ability to export all statistics to a JSON file.
The Extensions class is a utility that provides a set of default file extensions for the DataLoader
class. Its the back-bone for mapping all file extensions to its respective loading method.
DataLoader
The DataLoader
class is specifically designed for loading and processing data from directories. It provides the following key features:
total_workers
parameter to enhance performance.verbose
parameter will display the loading process for each file.verbose
parameter will write the loading process for each file to a log file.TextIOWrapper
.Future updates will include the ability to specify what loader method to use for a specific files efficiently.
path
(str or Path): The path of the directory from which to load files.directories
(Iterable): An iterable of directories from which to all files.default_extensions
(Iterable): Default file extensions to be processed.full_posix
(bool): Indicates whether to display full POSIX paths.no_method
(bool): Indicates whether to skip loading method matching execution.verbose
(bool): Indicates whether to display verbose output.generator
(bool): Indicates whether to return the loaded files as a generator; otherwise, returns as a dictionary.total_workers
(int): Number of workers for parallel execution.log
(Logger): A configured logger instance for logging messages. (Refer to the GetLogger class for more information on how to create a logger instance using the GetLogger
class.)ext_loaders
(dict[str, Any, dict[key-value]]): Dictionary containing extensions mapped to specified loaders. (Refer to the Extensions class for more information)load_file
(class_method): Load a specific file.get_files
(class_method): Retrieve files from a directory based on default extensions and filters unwanted files.dir_files
(property): Loaded files from specified directories.files
(property): Loaded files from a single directoryall_exts
(property): Retrieve all supported file extensions with their respective loader methods being used.EXTENSIONS
(Extensions class instance): Retrieve all default supported file extensions with their respective loader methods.DataMetrics
The DataMetrics
class focuses on processing data paths and gathering statistics related to the file system. Key features include:
paths
(Iterable): Paths for which to gather statistics.file_name
(str): The file name to be used when exporting all files metadata stats.full_posix
(bool): Indicates whether to display full POSIX paths.all_stats
: Retrieve statistics for all paths.total_size
: Calculate the total size of all paths.total_files
: Calculate the total number of files in all paths.export_stats()
: Export all statistics to a JSON file.os_stats_results
: OS statistics results for each path.st_fsize
: Full file size statistics.st_vsize
: Full volume size statistics.Extensions
:The Extensions
class is a utility that provides a set of default file extensions for the DataLoader
class. Its the back-bone for mapping all file extensions to its respective loading method. All extensions are stored in a dictionary (no period included), and the Extensions
class provides the following key features:
open
.Extensions
class with new files extensions and its respective loader methods.Extensions
class.Extensions()
: Initializes the Extensions
class with all implemented file extensions and their respective loader methods.
ALL_EXTS
: Retrieve all supported file extensions with their respective loader methods.get_loader
: Retrieve the loader method for a specific file extension.has_loader
: Checks if a specific file extension has a loader method implemented thats not open
.is_supported
: Checks if a specific file extension is supported.customize
: Customize the Extensions
class with new files extensions and its respective loader methods.
TextIOWrapper
.# Structure: {extension: {loader_method: {kwargs}}}
ext_loaders = {"csv": {pd.read_csv: {"header": 10}}}
GetLogger
The GetLogger
class is a utility that provides a method to get a configured logger instance for logging messages. It is designed to be user-friendly and easy to integrate into your projects.
name
(str, optional): The name of the logger. Defaults to the name of the calling module.level
(int, optional): The logging level. Defaults to logging.DEBUG.formatter_kwgs
(dict, optional): Additional keyword arguments for the log formatter.handler_kwgs
(dict, optional): Additional keyword arguments for the log handler.mode
(str, optional): The file mode for opening the log file. Defaults to "a" (append).refresher
(callable): A method to refresh the log file.set_verbose
(callable): A method to set the verbosity of the logger.verbose
is True, log messages will be printed to the console instead of being written to a file.DataLoader
Usage Examplesfrom data_loader import DataLoader
# Load all files with a specified path (directory) as a Generator
dl_gen = DataLoader(path="path/to/directory")
dl_files_gen = dl_gen.files
print(dl_files_gen)
# Output:
# <generator object DataLoader.files.<key-value> at 0x1163f4ba0>
from data_loader import DataLoader
# Load all files with a specified path (directory) as a Dictionary (Custom-Repr)
# Disabling 'generator' and 'full_posix' for displaying purposes.
dl_dict = DataLoader(path="path/to/directory", generator=False, full_posix=False)
dl_files_dict = dl_dict.files
print(dl_files_dict)
# Output:
# DataLoader((LICENSE.md, <TextIOWrapper>),
# (requirements.txt, <Str>),
# (Makefile, <Str>),
# ...
# (space_4.txt, <Str>))
from data_loader import DataLoader
# Load all files from multiple directories
# Disabling 'generator' and 'full_posix' for displaying purposes.
dl = DataLoader(directories=["path/to/dir1", "path/to/dir2"], generator=False, full_posix=False)
dl_dir_files = dl.dir_files
print(dl_dir_files)
# Output:
# DataLoader((file1.txt, <Str>),
# (file2.txt, <Str>),
# (file3.txt, <Str>),
# ...
# (fileN.txt, <Str>))
from data_loader import DataLoader
# Load all files with default extensions
dl_default = DataLoader(path="path/to/directory", default_extensions=["csv"], generator=False, full_posix=False)
dl_default_files = dl_default.files
print(dl_default_files)
# Output:
# DataLoader((file1.csv, <DataFrame>),
# (file2.csv, <DataFrame>),
# ...
# (fileN.csv, <DataFrame>))
from data_loader import DataLoader
# Retrieve data for a specific file
dl_files = DataLoader(path="path/to/directory", generator=False, full_posix=False).files
dl_specific_file_data = dl_files["file1.csv"]
# Output:
# <DataFrame>
from data_loader import DataLoader
import pandas as pd
# Specify your own custom loader methods
dl_custom = DataLoader(path="path/to/directory", ext_loaders={"csv": {pd.read_csv: {"nrows": 10}}}, generator=False, full_posix=False)
dl_custom_files = dl_custom.files
print(dl_custom_files)
# Output:
# DataLoader((file1.csv, <DataFrame>),
# (file2.csv, <DataFrame>),
# ...
# (fileN.csv, <DataFrame>))
# Note: The 'nrows' will be dynamically passed to the 'pd.read_csv' method for each file.
from data_loader import DataLoader
import logging
# Specify your own custom logger
custom_logger = logging.getLogger("DataLoader")
dl_with_logger = DataLoader(path="path/to/directory", log=custom_logger)
dl_logger_files = dl_with_logger.files
print(dl_logger_files)
# Output:
# <generator object DataLoader.files.<key-value> at 0x1163f4ba0>
# Note: The logger will be used to log or stream messages.
DataMetrics
Usagefrom data_loader import DataMetrics
# Retrieve statistics for all paths
dm = DataMetrics(files=["path/to/directory1", "path/to/directory2"])
print(dm.all_stats) # Retrieve statistics for all paths
# Calculate the total size of all paths
print(dm.total_size) # Calculate the total size of all paths
# Calculate the total number of files in all paths
print(dm.total_files) # Calculate the total number of files in all paths
dm.export_stats() # Export all statistics to a JSON file
Extensions
Usagefrom data_loader import Extensions
ALL_EXTS = Extensions() # Initializes the Extensions class or use the default instance Extensions().ALL_EXTS
print("csv" in ALL_EXTS) # True
print(ALL_EXTS.get_loader("csv")) # <function read_csv at 0x7f8e3e3e3d30>
# or
print(ALL_EXTS.get_loader(".pickle")) # <function read_csv at 0x7f8e3e3e3d30>
print(ALL_EXTS.has_loader("docx")) # False
print(ALL_EXTS.is_supported("docx")) # True
ALL_EXTS.customize({"docx": {open: {mode="rb"}},
"png": {PIL.Image.open: {}}}) # Customize the Extensions class with a new file extension and loader method
print(ALL_EXTS.get_loader("docx")) # <function <lambda> at 0x7f8e3e3e3d30>
GetLogger
Usage:# Create a logger with default settings
from data_loader import GetLogger
logger = GetLogger().logger
logger.info("This is an info message") # Writes to the log file
# Create a logger with custom settings
logger = GetLogger(name='custom_logger', level=logging.INFO, verbose=True).logger
logger.info("This is an info message") # Prints to the console
# Initiate verbosity
logger = GetLogger().logger
logger.set_verbose(True)
CustomException("Error Message") # Prints to the console
# Disable verbosity
logger.set_verbose(False).logger
CustomException("Error Message") # Writes to the log file
DataMetrics
Usage Examplesfrom data_metrics import DataMetrics
# Create a DataMetrics instance with paths and corresponding metadata
dm = DataMetrics(("path/to/directory1", <Dict>),
("path/to/directory2", <Dict>))
# Access metadata for a specific path
metadata_directory1 = dm["path/to/directory1"]
print(metadata_directory1)
# Output:
# {'os_stats_results': <os_stats_results>,
# 'st_fsize': Stats(symbolic='6.20 KB', calculated_size=6.19921875, bytes_size=6348),
# 'st_vsize': {'total': Stats(symbolic='465.63 GB (Gigabytes)', calculated_size=465.62699127197266, bytes_size=499963174912),
# 'used': Stats(symbolic='131.60 GB (Gigabytes)', calculated_size=131.59552001953125, bytes_size=141299613696),
# 'free': Stats(symbolic='334.03 GB (Gigabytes)', calculated_size=334.0314712524414, bytes_size=358663561216)}}
# Export all statistics to a JSON file
dm.export_stats(file_path="all_metadata_stats.json")
# Calculate the total size of all paths
total_size = dm.total_size
print(total_size)
# Output:
# Stats(symbolic='471.76 GB (Gigabytes)', calculated_size=471.75720977783203, bytes_size=507012679260)
# Calculate the total number of files in all paths
total_files = dm.total_files
print(total_files)
# Output:
# 215
ext_loaders
parameter.load_file
class method.
Feedback is crucial for the improvement of the DataLoader
project. If you encounter any issues, have suggestions, or want to share your experience, please consider the following channels:
GitHub Issues: Open an issue on the GitHub repository to report bugs or suggest enhancements.
Contact: Reach out to the project maintainer via the following:
Your feedback and contributions play a significant role in making the
DataLoader
project more robust and valuable for the community. Thank you for being part of this endeavor!
FAQs
Python utility designed to enable dynamic loading and processing of files.
We found that dynamic-loader demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.