Socket
Socket
Sign inDemoInstall

pandas-schema

Package Overview
Dependencies
3
Maintainers
1
Alerts
File Explorer

Install Socket

Protect your apps from supply chain attacks

Install

pandas-schema

A validation library for Pandas data frames using user-friendly schemas

    0.3.6

Maintainers
1

Readme

PandasSchema
************

For the full documentation, refer to the `Github Pages Website
<https://multimeric.github.io/PandasSchema/>`_.

======================================================================

PandasSchema is a module for validating tabulated data, such as CSVs
(Comma Separated Value files), and TSVs (Tab Separated Value files).
It uses the incredibly powerful data analysis tool Pandas to do so
quickly and efficiently.

For example, say your code expects a CSV that looks a bit like this:

.. code::

   Given Name,Family Name,Age,Sex,Customer ID
   Gerald,Hampton,82,Male,2582GABK
   Yuuwa,Miyake,27,Male,7951WVLW
   Edyta,Majewska,50,Female,7758NSID

Now you want to be able to ensure that the data in your CSV is in the
correct format:

.. code:: python

   import pandas as pd
   from io import StringIO
   from pandas_schema import Column, Schema
   from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation

   schema = Schema([
       Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
       Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
       Column('Age', [InRangeValidation(0, 120)]),
       Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
       Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
   ])

   test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,Sex,Customer ID
   Gerald ,Hampton,82,Male,2582GABK
   Yuuwa,Miyake,270,male,7951WVLW
   Edyta,Majewska ,50,Female,775ANSID
   '''))

   errors = schema.validate(test_data)

   for error in errors:
       print(error)

PandasSchema would then output

.. code:: text

   {row: 0, column: "Given Name"}: "Gerald " contains trailing whitespace
   {row: 1, column: "Age"}: "270" was not in the range [0, 120)
   {row: 1, column: "Sex"}: "male" is not in the list of legal options (Male, Female, Other)
   {row: 2, column: "Family Name"}: "Majewska " contains trailing whitespace
   {row: 2, column: "Customer ID"}: "775ANSID" does not match the pattern "\d{4}[A-Z]{4}"


Keywords

FAQs


Did you know?

Socket installs a GitHub app to automatically flag issues on every pull request and report the health of your dependencies. Find out what is inside your node modules and prevent malicious activity before you update the dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc