Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Infer JSON schema from CSV files.
The script can be installed via pip
pip install infer-schema
Currently, infer-schema is a single Python 3 script without any external dependencies, so you can download it to somewhere in your PATH and make it executable:
curl https://raw.githubusercontent.com/abhidg/infer-schema/main/infer_schema.py -o infer-schema
chmod +x infer-schema
./infer-schema file
See infer-schema(1)
(from a local clone, use man ./infer-schema.1
)
For the Python library interface, see below.
With a data file like
date,count
2023-11-20,10
2023-11-21,23
Running infer-schema will produce a JSON Schema that the CSV conforms to:
{
"$schema": "https://json-schema.org/draft-07/schema",
"description": "Description of data.csv",
"properties": {
"count": {
"description": "Description for column count",
"maximum": 23,
"minimum": 10,
"type": "integer"
},
"date": {
"description": "Description for column date",
"format": "date",
"type": "string"
}
},
"required": [
"date",
"count"
],
"title": "JSON Schema for data.csv"
}
The same result can be obtained using the Python module:
from infer_schema import infer_schema
schema = infer_schema("data.csv")
print(schema)
infer_schema(file: Union[Path, str], enum_threshold: int = 10, enum_fields: List[str] = [], bound_types: Set[DType] = {"integer", "number"}, explicit_nulls: bool = False)
Here DType is one of number, integer or string.
file (Path or str): CSV file
enum_threshold (int, default = 10): Threshold of number of unique values in column below which the field is typed enum
enum_fields (List[str], default = []): Forces a certain field to be
classed as an enum, useful for including fields that do not meet
enum-threshold
criteria
bound_types (Set[DType], default = {"integer", "number"}
): Types for
which bounds should be encoded into the schema, default is numbers, for which
minimum / maximum are determined. For strings minLength and maxLength are
determined. Set to None
to disable bound detection
explicit_nulls (bool, default = False): By default, fields that have null and another type are typed as non-required with the non-null type. Another interpretation is to assume the field will be present and allow it to dual-typed with null.
Returns: JSON Schema as a dictionary
Install pre-commit to setup ruff linting and formatting.
To generate the man page, scdoc is required:
scdoc < infer-schema.1.scd > infer-schema.1
FAQs
Infer JSON schema from CSV file
We found that infer-schema demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.