Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
This project simplifies gathering and processing of Uruguayan economic statistics. Data is retrieved from (mostly) government sources, processed into a familiar tabular format, tagged with useful metadata and can be transformed in several ways (converting to dollars, calculating rolling averages, resampling to other frequencies, etc.).
If this screenshot gives you anxiety, this package should be of interest.
A webapp with a limited but interactive version of econuy is available at econ.uy. Check out the repo as well.
The most basic econuy workflow goes like this:
from econuy.core import Pipeline
p = Pipeline()
p.get("labor_rates")
pip install econuy
git clone https://github.com/rxavier/econuy.git
cd econuy
python setup.py install
Full API documentation available at RTD
Pipeline()
classThis is the recommended entry point for the package. It allows setting up the common behavior for downloads, and holds the current working dataset.
from econuy.core import Pipeline
p = Pipeline(location="your_directory")
Pipeline.get()
methodRetrieves datasets (generally downloads them, unless the download
attribute is False
and the requested dataset exists at the location
) and loads them into the dataset
attribute as a Pandas DataFrame.
The Pipeline.available_datasets()
method returns a dict
with the available options.
from econuy.core import Pipeline
from sqlalchemy import create_engine
eng = create_engine("dialect+driver://user:pwd@host:port/database")
p = Pipeline(location=eng)
p.get("industrial_production")
Which also shows that econuy supports SQLAlchemy Engine
or Connection
objects.
Note that every time a dataset is retrieved, Pipeline
will
location
. If it does, it will read it and combine it with the new data (unless download=False
, in which case only existing data will be retrieved)location
, unless the always_save
attribute is set to False
or no new data is available.Data can be written and read to and from CSV or Excel files (controlled by the read_fmt
and save_fmt
attributes) or SQL (automatically determined from location
).
Metadata for each dataset is held in Pandas MultiIndexes with the following:
When writing, metadata can be included as dataset headers (Pandas MultiIndex columns), placed on another sheet if writing to Excel, or dropped. This is controlled by read_header
and save_header
.
Pipeline
objects with a valid dataset can access 6 transformation methods that modify the held dataset.
resample()
- resample data to a different frequency, taking into account whether data is of stock or flow type.chg_diff()
- calculate percent changes or differences for same period last year, last period or at annual rate.decompose()
- seasonally decompose series into trend or seasonally adjusted components.convert()
- convert to US dollars, constant prices or percent of GDP.rebase()
- set a period or window as 100, scale rest accordinglyrolling()
- calculate rolling windows, either average or sum.from econuy.core import Pipeline
p = Pipeline()
p.get("balance_nfps")
p.convert(flavor="usd")
p.resample(rule="A-DEC", operation="sum")
While Pipeline.get()
will generally save the retrieved dataset to location
, transformation methods won't automatically write data.
However, Pipeline.save()
can be used, which will overwrite the file on disk (or SQL table) with the contents in dataset
.
Session()
classLike a Pipeline
, except it can hold several datasets.
The datasets
attribute is a dict
of name-DataFrame pairs. Additionally, Session.get()
accepts a sequence of strings representing several datasets.
Transformation and saving methods support a select
parameter that determines which held datasets are considered.
from econuy.session import Session
s = Session(location="your/directory")
s.get(["cpi", "nxr_monthly"])
s.get("commodity_index")
s.rolling(window=12, operation="mean", select=["nxr_monthly", "commodity_index"])
Session.get_bulk()
makes it easy to get several datasets in one line.
from econuy.session import Session
s = Session()
s.get_bulk("all")
from econuy.session import Session
s = Session()
s.get_bulk("fiscal_accounts")
Session.concat()
combines selected datasets into a single DataFrame with a common frequency, and adds it as a new key-pair in datasets
.
The patool package is used in order to access data provided in .rar
format. This package requires that you have the unrar
binaries in your system, which in most cases you should already have. You can can get them from here if you don't.
Some retrieval functions need Selenium to be configured in order to scrape data. These functions include a driver
parameter in which a Selenium Webdriver can be passed, or they will attempt to configure a Chrome webdriver, even downloading the chromedriver binary if needed. This still requires an existing Chrome installation.
This project is heavily based on getting data from online sources that could change without notice, causing methods that download data to fail. While I try to stay on my toes and fix these quickly, it helps if you create an issue when you find one of these (or even submit a fix!).
Session.get_bulk()
mostly covers this.FAQs
Wrangling Uruguayan economic data so you don't have to.
We found that econuy demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.