New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

jskiner

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

jskiner

  • 0.1.1
  • PyPI
  • Socket score

Maintainers
1

Continuous Integration

JSkiner

The is a python Json Schema Inference Engine with Rust's core. Its inferencing speed is about 10 times of its pure-python counterpart (jsonschema-inference).

Installation

pip install jskiner

Usage

Checking the Json Schema of a Large .jsonl file

jskiner \
    --in <path_to_jsonl> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --split <number_of_split_batch_size>
    --split-path <path_to_store_the_split_files>

Checking the Json Schema for a folder of json files

jskiner \
    --in <path_to_jsons> 
    --verbose <false/true> 
    --out <output_file_path>
    --nworkers <number_of_cpu_core>
    --batch-size <batch_size_for_inferencing>
    --cuckoo-path <path_to_store_the_cuckoo_filter>
    --cuckoo-size <approximated_size_of_the_cuckoo_filter (Recommend using 10X of current json count)>
    --cuckoo-fpr <false_positive_rate_of_the_cuckoo_filter>

Infering the Schema in Python

from jskiner import InferenceEngine
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
json_string_list = ["1", "1.2", "null", "{\"a\": 1}"]
schema = engine.run(json_string_list)
schema

Union({Atomic(Float()), Atomic(Int()), Atomic(Non()), Record({"a": Atomic(Int())})})

Calculate the Union of a List of Schema

from jskiner import InferenceEngine
from jskiner.schema import Atomic, Int, Non
cpu_cnt = 16
engine = InferenceEngine(cpu_cnt)
schema = engine.run([Atomic(Int()), Atomic(Non()])
schema

Optional(Atomic(Int()))

Using | Operation between Two Schema

from jskiner import Atomic, Int, Non
schema = Atomic(Int()) | Atomic(Non())
schema

Optional(Atomic(Int()))

TODO:

  • Enable inference from a folder of json files
  • Enable ignoring of existing json files using cuckoo filter
  • Enable add starting schema file
  • Enable batch-by-batch process on large jsonl file
  • FIX: make sure repr escape special characters.
  • Auto Formatting Using Black
  • Enable sampling of json files
  • Debug: show input that causing panick. (alter panic str / alter reduce.py exception logging)
  • Fix: adding UnionRecord schema object
  • Enable direct inferencing from API online. (able to avoid repeat download of json)
  • Enable Regex to represent patterned FieldSet

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc