Socket
Book a DemoInstallSign in
Socket

@msrvida/data-inference

Package Overview
Dependencies
Maintainers
4
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@msrvida/data-inference

Infer column types in tabular data.

1.2.0
latest
Source
npmnpm
Version published
Maintainers
4
Created
Source

@msrvida/data-inference

Infer column types in tabular data. Inference is from vega-loader's inferTypes function (or similar function of your choice). This library enriches the column metadata with a ColumnStats object.

Demo

Usage

You will need an array of data objects, so that columns types can be inferred.

Modules

import * as DataInference from '@msrvida/data-inference';
import { inferTypes } from 'vega-loader';

const data = [
    {
        Name: 'chevrolet chevelle malibu',
        Miles_per_Gallon: 18,
        Cylinders: 8,
        Displacement: 307,
        Horsepower: 130,
        Weight_in_lbs: 3504,
        Acceleration: 12,
        Year: '1970-01-01',
        Origin: 'USA',
    },
    {
        Name: 'buick skylark 320',
        Miles_per_Gallon: 15,
        Cylinders: 8,
        Displacement: 350,
        Horsepower: 165,
        Weight_in_lbs: 3693,
        Acceleration: 11.5,
        Year: '1970-01-01',
        Origin: 'USA',
    },
];

const columns = DataInference.getColumnsFromData(inferTypes, data);
console.log(columns);

Pre-built library

<script src="https://unpkg.com/vega@^5.30/build/vega.js" charset="utf-8"></script>
<script src="https://unpkg.com/@msrvida/data-inference@1.0.0/dist/umd/data-inference.js" charset="utf-8"></script>
    <script>
        const data = [
            {
                Name: 'chevrolet chevelle malibu',
                Miles_per_Gallon: 18,
                Cylinders: 8,
                Displacement: 307,
                Horsepower: 130,
                Weight_in_lbs: 3504,
                Acceleration: 12,
                Year: '1970-01-01',
                Origin: 'USA',
            },
            {
                Name: 'buick skylark 320',
                Miles_per_Gallon: 15,
                Cylinders: 8,
                Displacement: 350,
                Horsepower: 165,
                Weight_in_lbs: 3693,
                Acceleration: 11.5,
                Year: '1970-01-01',
                Origin: 'USA',
            },
        ];

        const columns = DataInference.getColumnsFromData(vega.inferTypes, data);
        console.log(columns);
    </script>

Sample output

[
    {
        "name": "Name",
        "type": "string",
        "quantitative": false,
        "stats": {
            "distinctValueCount": 2,
            "max": null,
            "mean": null,
            "min": null
        },
        "isColorData": false
    },
    {
        "name": "Miles_per_Gallon",
        "type": "integer",
        "quantitative": true,
        "stats": {
            "distinctValueCount": 2,
            "max": 18,
            "mean": 16.5,
            "min": 15,
            "hasNegative": false,
            "isSequential": false
        }
    },
    {
        "name": "Cylinders",
        "type": "integer",
        "quantitative": true,
        "stats": {
            "distinctValueCount": 1,
            "max": 8,
            "mean": 8,
            "min": 8,
            "hasNegative": false,
            "isSequential": false
        }
    },
    .
    .
    .
]

Pandas-style simulation

As of version 1.2, the pandasSimulation namespace provides describe(), head(), and info() functions to emit data inferences in similar format of Python's Pandas library

head

console.log(DataInference.pandasSimulation.head(columns, data));
                       Name Miles_per_Gallon Cylinders Displacement Horsepower \
1 chevrolet chevelle malibu               18         8          307        130
2         buick skylark 320               15         8          350        165
3        plymouth satellite               18         8          318        150
4             amc rebel sst               16         8          304        150
5               ford torino               17         8          302        140

Weight_in_lbs Acceleration       Year Origin
         3504           12 1970-01-01    USA
         3693         11.5 1970-01-01    USA
         3436           11 1970-01-01    USA
         3433           12 1970-01-01    USA
         3449         10.5 1970-01-01    USA

info

console.log(DataInference.pandasSimulation.info(columns, data));
info
<class 'pandas.core.frame.DataFrame'>
Index: 406 entries, 0 to 405
Data columns (total 9 columns):

# Column           Non-Null Count Dtype         
- ---------------- -------------- --------------
0 Name             406 non-null   object        
1 Miles_per_Gallon 398 non-null   float64       
2 Cylinders        406 non-null   int64         
3 Displacement     406 non-null   float64       
4 Horsepower       400 non-null   int64         
5 Weight_in_lbs    406 non-null   int64         
6 Acceleration     406 non-null   float64       
7 Year             406 non-null   datetime64[ns]
8 Origin           406 non-null   object        


dtypes: 3 float64, 3 int64, 2 object
memory usage: 21.0 KB

For more information

Please visit the SandDance website.

FAQs

Package last updated on 11 Sep 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.