Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Toolbox for easy and effective data exploration in Python. It is designed to work with Jupyter notebooks especially, but it can also be used in any python module.
You can easily install EasyExplore via pip install easyexplore on every operating system.
EasyExplore is designed as a wrapper which helps Data Scientists to explore data more convinient and efficient.
You can easily import data set from several files as well as databases into a Pandas or dask DataFrame.
You can easily import data set from Pandas DataFrame or other data objects into several files or databases.
Explore your data set quickly and efficiently using the DataExplorer:
-- Data Typing:
Check whether represented data types of Pandas is equal to the real data types occuring in the data
-- Data Health Check:
Check the health of the data set in order to detecting, describing and visualizing ...
... the ammount of missing or invalid data vs. valid observations
... the amount of duplicated data
... the amount of invariant data
-- Data Distribution:
Describing and visualizing statistical distribution of ...
... categorical features
... continuous features
... date features
-- Outlier Detection:
Analyze outliers or anomalies of continuous features using univariate and multivariate methods:
a) Univariate: Examines outlier values for each features separately using Inter-Quantile-Range (IQR)
b) Multivarite: Examines outliers for each possible feature pair combined using a bunch of different machine learning algorithms. For further information just look at the PyOD packages documentation, because it is used under the hood.
-- Categorical Breakdown Statistics:
Descriptive statistics of continuous features grouped by values of each categorical feature in the data set:
-- Correlation:
Correlation analysis of continuous features. For analyzing multi-collinearity there is a partial correlation method implemented. The differences between marginal and partial correlations are inspected by visualizing the differences of the coefficients in a heat map as well.
-- Geo Statistics:
Descriptive statistics of continuous features grouped by values of each geo features in the data set. Additionally, there is a geo map (OpenStreetMap) generated to visualize statistical distribution.
-- Text Analyzer:
Analyze potential text features and generate various numerical features from those
Visualize your data set very easily using Plot.ly an interactive visualization library under the hood. The DataVisualizer is an efficient wrapper to abstract the most important elements for data exploration:
-- Table Chart:
Visualize matrix (Pandas DataFrame) as an interactive table
-- Heat Map:
Visualize value range of continuous features as heat map
-- Geo Map:
Visualize statistics of categorical and continuous features as interactive OpenStreetMap
-- Contour Chart:
Visualize value ranges of at least two continuous features as contours
-- Pie Chart:
Visualize occurances of values of categorical features as an interactive pie chart
-- Bar Chart:
Visualize occurances of values of categorical features as an interactive bar chart
-- Histogram:
Visualize distribution of continuous features as an interactive histogram
-- Box-Whisker-Plot:
Visualize descriptive statistics of continuous features as an interactive box-whisker-plot
-- Violin Chart:
Visualize descriptive statistics of continuous features as an interactive violin chart
-- Parallel Category Chart:
Visualize relationships interactively between categorical features especially, but it can also be used for mixed relations between values of categorical and continuous features by using brushing as well.
-- Parallel Coordinate Chart:
Visualize relationships interactively between ranges of continuous features especially, but it can also be used for mixed relations between values of categorical and ranges of continuous features as well.
-- Scatter Chart:
Visualize values of continuous features interactively.
-- Scatter3D Chart:
Visualize values of three continuous features in one chart interactively.
-- Joint Distribution Chart:
Visualize values of two continuous features interactively, including contours and histogram for each continuous feature.
-- Ridgeline Chart:
Visualize changes in distribution of continuous features on certain time steps separately.
-- Line Chart:
Visualize distribution after certain time steps as an interactive line chart.
-- Candlestick Chart:
Visualize descritive statistics for each time steps as an interactive candlestick chart.
-- Dendrogram:
Visualize hierarchical clusters.
-- Silhoutte Chart:
Visualize partitionized clusters.
Check the jupyter notebook for examples. Happy exploring :)
FAQs
Toolbox for easy and effective data exploration
We found that easyexplore demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.