Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
AECData is a Python library designed to facilitate seamless interaction with the 2050-materials platform API (see documentation here). By abstracting the complexities of authentication and data retrieval, this library empowers developers to focus on analyzing and utilizing environmental data about construction materials effectively.
The library is structured to guide the user through a seamless process from authentication to data analysis:
Initialization: Start by creating a User
instance with your developer token (see how to get a token here). This instance will manage your authentication and serve as your gateway to fetching data.
Fetching Data: Utilize the User
class to retrieve data from the API. You can apply various filters to narrow down the data according to your needs.
Data Manipulation: Once you have fetched the data, you can construct a ProductData
object. This object allows for extensive manipulation and transformation of the data, enabling you to prepare it for analysis or visualization in a format that suits your requirements.
Statistical Analysis and Visualization: For advanced data analysis, the ProductStatistics
class extends ProductData
to provide statistical insights. It enables outlier removal, distribution analysis, and more, coupled with visualization capabilities to help interpret the data effectively.
The AECData library can be installed directly from PyPI to ensure you have the latest version. Use the following pip command:
pip install aecdata
For source code, issues, and contributions, the project is hosted on GitHub at:
git clone https://github.com/2050-Materials/aecdata.git
Authenticator
ClassThe Authenticator
class is designed to handle the authentication process for accessing the API. It is used internally to manage both API tokens and to ensure continued access.
__init__(self, developer_token, base_api_url = 'https://app.2050-materials.com/')
: Initializes the Authenticator
instance.
developer_token
: The developer token provided for API access.base_api_url
(optional): The base URL for the API endpoints.get_token(self)
: Retrieves the API token using the developer token. This method is ideal for operations that only require a single API token without the need for a refresh token.get_api_and_refresh_token(self)
: Retrieves both the API token and a refresh token. This is useful for long-running applications that may need to refresh the API token.refresh_api_token(self)
: Refreshes the API token using the refresh token. This method should be used when the API token has expired and a new one is needed without re-authenticating using the developer token.User
ClassThe User
class provides a high-level interface for interacting with the 2050-materials API, handling authentication, and performing API requests to fetch and manage data related to construction materials.
base_api_url
: The base URL for the API endpoints. It defaults to the 2050-materials API but can be customized if necessary.authenticator
: An instance of the Authenticator
class responsible for managing authentication tokens.api_token
: The current API token obtained through the Authenticator
. It's used for authenticating API requests.refresh_token
: A token that can be used to refresh the api_token
when it expires, ensuring continuous access without re-authentication._filters
: A private attribute that caches filters available for querying the API._field_description
: A private attribute that stores detailed descriptions of available fields for products, used for filtering and data retrieval.filters
: A property that lazily fetches and returns available filters for querying the API, caching the result.field_description
: A property that lazily fetches and returns detailed descriptions of available product fields, caching the result.__init__(self, developer_token, base_api_url="https://app.2050-materials.com/")
: Initializes a new User
instance with a given developer token and optionally a custom API base URL.refresh_api_token(self)
: Utilizes the Authenticator
to refresh the API token. Updates the User
instance's api_token
with the new value.get_filters(self)
: Fetches and caches filter options available for querying products from the API.get_field_description(self)
: Fetches and caches the descriptions of available fields for products, aiding in data manipulation and query customization.get_filters_template(self)
: Retrieves a template of filters available for product queries.get_input_lca_fields_description(self)
: Gets descriptions for input fields related to Life Cycle Assessment (LCA) data.get_output_lca_fields_description(self)
: Retrieves descriptions for output fields of LCA data.get_impact_lca_fields_description(self)
: Fetches descriptions of impact fields within LCA data.get_material_facts_fields_description(self)
: Provides descriptions of material facts fields, aiding in understanding data structure.get_physical_properties_fields_description(self)
: Details descriptions of fields related to the physical properties of materials.get_technical_parameters_fields_description(self)
: Retrieves descriptions of technical parameters fields.get_product_fields_description(self)
: Gets descriptions of product-related fields.get_unit_categories(self)
: Returns the categories of units used within the data.get_primary_units(self)
: Provides a list of primary units for data measurement.get_mf_num_fields(self)
: Lists numerical fields related to material facts.get_mf_perc_fields(self)
: Lists percentage fields within material facts data.get_physical_properties_fields(self)
: Returns a list of fields detailing the physical properties of materials.get_filters_mapping(self)
: Creates and returns a mapping of filter options to simplify query construction.get_products_page(self, page=1, openapi=False, **filters)
: Fetches a specific page of product data, optionally applying filters.get_products(self, openapi=False, **filters)
: Fetches all products, optionally applying filters, and handles pagination automatically. Use openapi=False
to use the free Open API.# Initialize the API user with your developer token
user = User(developer_token = developer_token)
# Lazy calculation of filters
user.filters
# Refresh the API token
user.refresh_api_token()
# Fetch all filter mappings
all_mappings = user.get_filters_mapping()
# Fetch products whose material type family is Ceramic
filters = {
'material_type_family': all_mappings['material_types_family']['Ceramic'],
}
filtered_products = user.get_products(openapi=False, **filters) # Set openapi=True for the free version
ProductData
ClassThe ProductData
class is designed to manage and manipulate the data fetched from the 2050-materials API. It offers functionalities for converting data between different formats, scaling product data based on units and amounts, and generating plots for product contributions.
_data
: A private attribute that holds the original data passed to the instance, either as a list of dictionaries or a pandas DataFrame._dataframe
: A private attribute that stores the data in a pandas DataFrame format, allowing for easy manipulation and analysis.__init__(self, data)
: Accepts data either as a list of product dictionaries or a pandas DataFrame and initializes the ProductData
instance. The constructor ensures that _data
and _dataframe
are synchronized.data
: Provides access to the raw data. Setting this property updates both the _data
attribute and the corresponding DataFrame in _dataframe
.dataframe
: Allows access to the data in a pandas DataFrame format. Setting this property updates _dataframe
and the raw data in _data
.to_dataframe(self, data)
: Converts the list of dictionaries (or the raw data) into a pandas DataFrame with ordered columns, facilitating data analysis.to_csv(self, file_path)
: Exports the data to a CSV file, saving it to the specified path.to_json(self, file_path)
: Exports the data to a JSON file at the given path.to_json_string(self)
: Converts the data into a JSON-formatted string, useful for serialization or sending data over a network.get_available_units(self)
: Extracts and returns a set of available units for scaling based on the data's 'material_facts.scaling_factors' entries.convert_df_to_unit(self, df, unit='declared_unit', amount=1)
: Scales the DataFrame's numerical fields to the specified unit and amount.scale_products_by_unit_and_amount(self, products_info)
: Scales product data based on a dictionary mapping product UUIDs to units and amounts, facilitating comparisons and aggregations.get_product_contributions(self, products_info, field_name)
: Generates a pie chart illustrating the contributions of each product to the specified field (e.g., CO2 emissions).to_epdx(self)
: Converts the data into the EPDx format, suitable for environmental product declarations.# Create ProductData object from API data
product_data = ProductData(filtered_products)
# Access the dataframe
df = product_data.dataframe
# Create a custom project with specified products, units and amounts
products_info = {
'6aa6d32c-f8cf-11ed-9c01-0242ac120004' : {'amount': 0.2},
'c0800dc0-f8cc-11ed-9c01-0242ac120004' : {'amount': 0.30},
'6ab16fb2-f8cf-11ed-9c01-0242ac120004' : {'unit':'m2'},
'6aab7152-f8cf-11ed-9c01-0242ac120004' : {'unit':'m2', 'amount': 1},
}
import matplotlib.pyplot as plt
scale_df = product_data.scale_products_by_unit_and_amount(products_info)
field_name = 'material_facts.manufacturing'
contributions = product_data.get_product_contributions(products_info, 'material_facts.manufacturing')
contributions
# Plotting pie chart
plt.figure(figsize=(10, 8))
contributions.plot(kind='pie', autopct='%1.1f%%', startangle=90, counterclock=False, labels=contributions.index)
plt.title(f'Contribution of Each Product by {field_name}')
plt.ylabel('') # Hide y-axis label for clarity
plt.show()
# Convert to EPDx format
epdx_products = product_data.to_epdx()
ProductStatistics
ClassThe ProductStatistics
class extends the ProductData
class, offering advanced statistical analysis capabilities for the product data obtained from the 2050-materials API. It enables users to perform detailed statistical analyses, including filtering based on various criteria, removing outliers, and plotting distributions.
Inherits all attributes from the ProductData
class.
__init__(self, data, unit='declared_unit')
: Initializes a new instance of the ProductStatistics
class. It accepts either a list of products, a pandas DataFrame, or an instance of ProductData
. The unit
parameter specifies the measurement unit for statistical analysis.get_lca_fields(self, fields='all', modules='all')
: Returns a DataFrame filtered to include only the specified LCA fields and modules, allowing for targeted analysis.get_available_groupings(self)
: Identifies possible groupings for the data based on its characteristics, aiding in segmented analysis.get_available_fields_dict(self)
: Lists all available fields within the data that can be used for statistical analysis.get_available_fields(self)
: Returns a list of all fields available for analysis, consolidating the information from get_available_fields_dict
.filter_df_by_dict(self, df, filter_dict)
: Applies a dictionary of filters to the DataFrame, allowing for refined data selection.get_group_by_combinations(self, df, group_by, min_count)
: Determines valid combinations for grouping the data, based on specified criteria and a minimum count threshold for inclusion.get_group_by_dict(self, df, group_by)
: Generates a dictionary representing potential groupings for the data, based on the specified group_by criteria.get_statistics(self, group_by=None, fields=None, statistical_metrics=None, include_estimated_values=False, remove_outliers=True, method='IQR', sqrt_tranf=True, min_count=4)
: Computes statistical metrics for the specified fields and groupings, offering options to include estimated values, remove outliers, and adjust for small sample sizes.remove_outliers_from_df(self, filtered_df, field, method, sqrt_tranf)
: Removes outliers from the DataFrame based on the specified field and method, optionally applying a square root transformation for variance stabilization.filter_dataframe_based_on_field(self, field, filters=None, include_estimated_values=False, remove_outliers=True, method='IQR', sqrt_tranf=True)
: Filters the DataFrame based on specified criteria, including the treatment of estimated values and outliers.get_product_contributions(products_info, 'material_facts.manufacturing')
: Takes a dictionary with product ids as keys and values a dictionary with unit and amount and for an LCA field calculates the percentage contribution of carbon emissions.get_field_distribution(self, field, filters=None, include_estimated_values=False, remove_outliers=True, method='IQR', sqrt_tranf=True)
: Filters the DataFrame and returns the distribution of the specified field, offering insights into the data's structure.get_field_distribution_boxplot(self, field, group_by_field, filters=None, include_estimated_values=False, remove_outliers=True, method='IQR', sqrt_tranf=True)
: Filters the DataFrame and generates a boxplot for the specified field, providing a visual representation of statistical distributions.import matplotlib.pyplot as plt
def plot_histogram(df, field):
bin_count = min(len(df[field].unique()), 50) # Limit the number of bins to a maximum of 50
plt.figure(figsize=(10, 6))
n, bins, patches = plt.hist(df[field], bins=bin_count, color='#2ab0ff', alpha=0.7, rwidth=0.85)
plt.grid(axis='y', alpha=0.75)
plt.xlabel('Value', fontsize=15)
plt.ylabel('Frequency', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.title(f'Distribution of {field}', fontsize=15)
plt.axvline(x=df[field].mean(), color='r', linestyle='-', label=f'Mean: {df[field].mean():.4f}')
plt.axvline(x=df[field].median(), color='m', linestyle='-', label=f'Median: {df[field].median():.4f}')
plt.legend(loc='upper right')
plt.show()
filters = {
'material_type':'Ceramic',
# 'elements_nrm_1':'3.1 - Wall finishes'
}
field = 'material_facts.manufacturing'
df = stats_obj.get_field_distribution(field=field, filters=filters, include_estimated_values=True, remove_outliers=False, method='IQR', sqrt_tranf=True)
plot_histogram(df, field)
# With removing outliers
field = 'material_facts.manufacturing'
df = stats_obj.get_field_distribution(field=field, filters=filters, include_estimated_values=True, remove_outliers=True, method='IQR', sqrt_tranf=True)
plot_histogram(df, field)
# Field distribution boxplot
field='material_facts.manufacturing'
group_by_field = 'product_type'
# Get a dict with keys product types and values dataframe series
grouped_data_dict = stats_obj.get_field_distribution_boxplot(field=field, group_by_field=group_by_field, filters=None, include_estimated_values=True, remove_outliers=True, method='IQR', sqrt_tranf=True)
# Box-plotting
plt.figure(figsize=(10, 6))
plt.boxplot(grouped_data_dict.values(), patch_artist=True, labels=grouped_data_dict.keys())
plt.grid(axis='y', alpha=0.75)
plt.xlabel(group_by_field, fontsize=15)
plt.ylabel('Value', fontsize=15)
plt.xticks(fontsize=12, rotation=45) # Rotate labels if there are many groups
plt.yticks(fontsize=12)
plt.title(f'Distribution of {field} by {group_by_field}', fontsize=15)
plt.show()
FAQs
A Python library designed to simplify access to the 2050-materials API
We found that aecdata demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.