You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

cd2t

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

cd2t

cd2t validates data structure, data types and values with templates

2.2.2

PyPI

Maintainers: 1

cd2t

repository: https://gitlab.com/ko.no/cd2t

Table of Content

Key Features

Feature Rich Data Type and Value Validation: Many data types which basic testing of content available.
Unlimited Data Structure: Recursive linking of data types like lists or objects can represent any data structure.
Data Structure Nesting: Templates allows you to define repeating data structures only once. Templates can be unlimited nested. Loops are not allowed.
Referencing: Referencing can check the uniqueness of values at different positions in the data structure (i.e. lists of objects with ID attribute). It also can enforce a consumer/producer modell. In example, strings at some positions can be collected as producers. Strings at other positions must match to one of those produced string. Scope of references can be limited to namespace.
Multi Data Support: Multiple data sources can be checked with one schema or many schemas. You can switch schemas during iterating over data sources. Referencing or Autogeneration works across schemas and data sources by using namespaces and reference keys.
Value Autogeneration: Some data types support creation of non-existing values. I.e. unique IDs can be added to a data structure. Uniqueness can be limited to namespaces.
Schema Validation: Typos, syntax mistakes or missing required options are reported as SchemaErrors (Exception) during schema loading. Reason and path through schema structured are provided.
Data Validation: Validation returns a list of findings. Each finding provides the position within the data structure and the validation finding reason.

Data Schema

Data schema describes the expected structure and the expected values of the data, which to be validated. It also includes schema meta data and control options.

Version 2

version: 2
name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> true >
# Shortcuts are Data Type strings instead of dictionaries.
# Corresponding data type with default options is used.

root: { data type schema }

custom_data_types:
  < data type name >:
    type: < built-in data type name > # required; customizable data type
    < build-in data type option >: < value >

# Templates can be used in data schema below 'root' (1).
# Or below any other template if used in a child data type (2) or
# directly if data type options are changed(2).
# Example (1):
# anywhere_below_root:
#   template: template1
#   minimum: 1  # Overwrite or add minimum from/in template schema
# Example (2):
# templates:
#   template1:
#     type: integer
#     minimum: 100
#     maximum: 1000
#   template2:
#     type: multitype
#     types:
#       - type: string
#       - template: template1
# Example (3):
# templates:
#   template1:
#     type: integer
#     minimum: 100
#     maximum: 1000
#   template2:
#     template: template1
#     minimum: 200
templates:
  < template name >: { data schema }  # no data type shortcut allowed.
# Global options while merging template schema with local schema.
template_merge_options:
  # merge dictionaries in data type options recursively.
  # If false, local data options overwrites template option completely.
  recursive: < bool | default -> true >
  # How lists in data type options should be merged.
  list_merge: < append | append_rp | prepend | prepend_rp | replace
                | default -> append_rp >

Version 1

name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> false >
# Shortcuts are Data Type strings instead of dictionaries.
# Corresponding data type with default options is used.

root: { data type schema }

subschemas:
  < sub schema name >: { data type schema }

Built-In Data Types

Generic Data Types

'any' Data Type

'any' Description

This data type represents any data. The validator stops further data validation or autogeneration.

'any' Limitations

referencing is not supported
autogeneration is not supported

'any' Schema

type: 'any' # If type is omitted, validator uses Any Data Type
description: < str > # Information about data type and value

'bool' Data Type

'bool' Description

This data type represents a boolean values (true/false).

'bool' Limitations

referencing is not supported

'bool' Schema

type: 'bool'
description: < str > # Information about data type and value
allowed_value: < bool > # true or false
autogenerate: < bool | default -> false >
# Autogenerate the default value, if data is not existing.
# Requires 'autogenerate_default'

autogenerate_default: < bool > # true or false; must match 'allowed_value' (if set)

'enum' Data Type

'enum' Description

This data type represents a selection of allowed values.

'enum' Limitations

referencing is not supported
autogeneration is not supported

'enum' Schema

type: 'enum'
description: < str > # Information about data type and value
allowed_values: # required
- < value >

'float' Data Type

'float' Description

This data type represents float values.

'float' Limitations

None.

'float' Schema Keys

type: 'float'
description: < str > # Information about data type and value
reference: { unique options }
maximum: < float >
# value must be lower or equal to this

minimum: < float >
# value must be greater or equal to this

maximum_decimals: < int > # >= 0
# Maximum allowed decimal places.

allowed_values:
- < float > # value must match this value
- round: < int > # value rounded to < int > digits must match 'matches'
  matches: < float >
- range_start: < float > # 'range_start' <= value <= 'range_end'
  range_end: < float >
# List of directives which values must match.

not_allowed_values:
- < float > # value mustn't match this value
- round: < int > # value rounded to < int > digits mustn't match 'matches'
  matches: < float >
- range_start: < float > # value < 'range_start' and value > 'range_end'
  range_end: < float >
# List of directives which values mustn't match.

autogenerate: < bool | default -> false >
# uses 'autogenerate_default' value.
#
# OR
#
# try for 'autogenerate_random_tries' times:
#   1. Create a random float value, which is within the 'autogenerate_ranges'
#      or 'minimum' <= random value <= 'maximum'
#   2. Check if random value passes the validation process.

autogenerate_default: < float >
# Autogenerate uses this value.

autogenerate_random_tries: < int | default 10 > # 0 < x < 50
# Ignored, if 'autogenerate_default' is set.
# Maximum amount of tries to find a random float value,
# which is not used by any reference.
# Integer value must be greater than 0 and lower than 50.

autogenerate_ranges:
- minimum: < float >
  maximum: < float >
# Ignored, if 'autogenerate_default' is set.
# '[.]minimum' <= '[.]maximum'
# Autogenerated float is within the ranges.
# If omitted, global 'minimum' and 'maximum' limits the random value.

autogenerate_random_decimals: < int | default 2 > # >= 0
# Ignored, if 'autogenerate_default' is set.
# Limit the decimal places for the random value.

'float' Validation Process

If options are missing, corresponding checks are skipped.

value >= minimum
value <= maximum
round of value == value
value is not in not_allowed_values
value is in allowed_values

'idlist' Data Type

'idlist' Description

This data type represents a dictionary, where keys are IDs. IDs can be strings or integer.

'idlist' Limitations

autogeneration is not supported

'idlist' Schema (in Version 2)

type: 'idlist'
description: < str > # Information about data type and value
reference: { unique options }
# Note: Every ID is referenced as a value with the 'reference.key'.

minimum: <int | default -> 0 > # >= 0
# Minimum required amount of IDs

maximum: < int > # >= 0
# Maximum allowed amount of IDs
# If omitted, even an empty idlist is allowed.

elements: { data type schema } # required
# Data schema defining element data type

id: { data type schema }
# Data schema defining IDs
# If omitted, ID type defaults to 'string'.

'idlist' Schema (in Version 1)

type: 'idlist'
description: < str > # Information about data type and value
reference: { unique options }
# Note: Every ID is referenced as a value with the 'reference.key'.

minimum: <int | default -> 0 > # >= 0
# Minimum required amount of IDs

maximum: < int > # >= 0
# Maximum allowed amount of IDs
# If omitted, even an empty idlist is allowed.

elements: { data type schema } # required
# Data schema defining element data type

id_type: < 'integer' | 'string' | default -> 'string' >
# Indicates if IDs are integer or string

id_minimum: <int | default -> 0 > # >= 0
# Minimum required ID string length or minimum ID integer value

id_maximum: < int > # >= 0
# Maximum required ID string length or maximum ID integer value
# If omitted, even '' is allowed as ID string.

allowed_ids:
- < string | integer >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is allowed.

not_allowed_ids:
- < string >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is not allowed.
# 'not_allowed_ids' are test before 'allowed_ids'.

'integer' Data Type

'integer' Description

This data type represents integer values.

'integer' Limitations

None.

'integer' Schema

type: 'integer'
description: < str > # Information about data type and value
reference: { unique options }
maximum: < int >
# value must be lower or equal to this

minimum: < int >
# value must be greater or equal to this

not_allowed_values:
- < int >
# List of integers which values mustn't match.

autogenerate: < bool | default -> false >
# Requires 'reference.key' to be defined and not ''.
# If no unique value could be generated, autogeneration fails.

autogenerate_default: < int >
# Generate this integer value if value is None.
# Ignores all other 'autogen' options

autogenerate_maximum: < int >
# Autogenerated integer must be lower or equals to this.
# If omitted, 'maximum' key is upper limit

autogenerate_minimum: < int >
# Autogenerated integer must be greater or equals to this.
# If omitted, 'minimum' key is lower limit

autogenerate_find: < 'next_higher' | 'next_lower' |
                     'random' | default -> 'next_higher' >
# Ignored, if 'autogenerate_default' is set.
# Tells autogenerate to try first available integer value
# starting at 'minimum' and increasing ('next_higher'),
# starting at 'maximum' and decreasing ('next_lower') or
# picking a random number within 'minimum' and 'maximum'.

'list' Data Type

'list' Description

This data type represents a list of same data types. If different data types are allowed in the list, use data type 'multitype' as elements.

'list' Limitations

Not customizable
referencing is not supported - use referencing in the 'elements' data type
autogeneration of list elements is not supported - but autogeneration within existing elements data structure is supported (pass-through).

'list' Schema

type: 'list' # required
description: < str > # Information about data type and value
elements: { data type schema }  # required
# Data schema defining elements data type

minimum: <int | default -> 0 > # >= 0
# Minimum required amount of elements in the list.

maximum: < int > # >= 0
# Maximum allowed amount of elements in the list.

allow_duplicates: < bool | default -> true >
# Allow same element data multiple times

'multitype' Data Type

'multitype' Description

This data type represents a selection of allowed data types.

'multitype' Features & Limitations

Not customizable
referencing is not supported - use referencing in the 'elements' data type
autogeneration of data types is not supported - but autogeneration within existing data structure is supported (pass-through).
Multitype in Multiype is not allowed

'multitype' Schema

type: 'multitype'
description: < str > # Information about data type and value
types: # required
- { data type schema }
# List of data type schemas.

'none' Data Type

'none' Description

This data type represents a none or null value.

'none' Limitations

referencing is not supported
autogeneration of data types is not supported - it is already none :wink:

'none' Schema Keys

type: 'none'
description: < str > # Information about data type and value

'object' Data Type

'object' Description

This data type represents an object with attributes. Technically its a dictionary in Python. Attributes of the object are keys in the dictionary.

'object' Limitations

Not customizable
autogeneration of missing keys is supported, if value data type supports autogeneration

'object' Schema (in Version 2)

type: 'object'
description: < str > # Information about data type and value
attributes:
  < attribute_name >: { data type schema }
# Mapping with key as attribute name and value as data type schema.
# If omitted, any data which is an dictionary is accepted.

required_attributes:
- < attribute_name >
# List of attribute names, which must be in the object.

required_xor_attributes:
- - < attribute_name>
# List of attribute name lists,
# where one (and only one) attribute must be in the object.

ignore_undefined_attributes: < bool | default -> false >
# Tell validator to ignore attributes in data object,
# which are not defined in 'attributes'.

dependencies:
  < attribute_name >:
    requires:
    - < attribute_name >
    # List of attribute names, which must be in the object,
    # if this attribute is in.
    excludes:
    - < attribute_name >
    # List of attribute names, which must not be in the object,
    # if this attribute is in.

allow_regex_attributes: < bool | default -> False >
# If enabled, regular expressions are allowed in:
# 'attributes': If object attribute name matches, schema is verified.
# 'required_attributes': Each element must have a at least one matching
#                        attribute name.
# 'dependencies.<>.requires': Successful if any object attribute name matches
#                             each list entry.
# 'dependencies.<>.excludes': Error if any object attribute name matches any
#                             list entry.
# !!! Disables autogeneration of missing keys !!!

autogenerate: < bool | default -> True >
# Enable/Disable autogeneration of missing attributes,
# if 'allow_regex_attributes' == false and
# attribute's data type supports autogeneration and is defined within.

reference:
  { reference options }
  # The validator checks, if the same combination of attribute values
  # is specified at another data type with the same 'reference.key'.
  attributes:
    - < attribute_name >
    # List of attribute names, which values should be combined uniqueness check.

'object' Schema (in Version 1)

type: 'object'
description: < str > # Information about data type and value
attributes:
  < attribute_name >: { data type schema }
# Mapping with key as attribute name and value as data type schema.
# If omitted, any data which is an dictionary is accepted.

required_attributes:
- < attribute_name >
# List of attribute names, which must be in the object.

ignore_undefined_attributes: < bool | default -> false >
# Tell validator to ignore attributes in data object,
# which are not defined in 'attributes'.

dependencies:
  < attribute_name >:
    requires:
    - < attribute_name >
    # List of attribute names, which must be in the object,
    # if this attribute is in.
    excludes:
    - < attribute_name >
    # List of attribute names, which must not be in the object,
    # if this attribute is in.

allow_regex_attributes: < bool | default -> False >
# If enabled, regular expressions are allowed in:
# 'attributes': If object attribute name matches, schema is verified.
# 'required_attributes': Each element must have a at least one matching
#                        attribute name.
# 'dependencies.<>.requires': Successful if any object attribute name matches
#                             each list entry.
# 'dependencies.<>.excludes': Error if any object attribute name matches any
#                             list entry.
# !!! Disables autogeneration of missing keys !!!

autogenerate: < bool | default -> True >
# Enable/Disable autogeneration of missing attributes,
# if 'allow_regex_attributes' == false and
# attribute's data type supports autogeneration and is defined within.

reference: { reference options }
# The validator checks, if the same combination of attribute values
# is specified at another data type with the same 'reference.key'.
# Requires 'reference_attributes' to be defined.

reference_attributes:
- < attribute_name >
# List of attribute names, which values should be combined uniqueness check.

'schema' Data Type

'schema' Description

This data type does not represents an expected data value. It uses a subschema's root data type to process the data structure.

'schema' Limitations

Supported in Version 1 only (Use templates in version 2)
Not customizable

'schema' Schema

type: 'schema'
description: < str > # Information about data type and value
subschema: < str >
# Name of the subschema, which is defined under 'subschemas' in schema.

'string' Data Type

'string' Description

This data type represents a string.

'string' Limitations

autogeneration is not supported.

'string' Schema

type: 'string'
description: < str > # Information about data type and value
reference:
  # global reference options plus:
  allow_namespace_lookups: < bool > # Only valid for 'consumer' mode
  namespace_separator_char: < string >
# Process:
# - Check if *namespace_separator_char* is in string value
# - extract namespace from left part of first finding
# - extract value from right part of first finding
# - lookup for *provider value* == *extracted value* in
#   namespace *extracted namespace* for reference.key

minimum: <int | default -> 0 > # >= 0
# Minimum required string length

maximum: < int > # >= 0
# Maximum allowed string length

allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String must be equal to any string in the list.
# == true: String must match with any regex in the list.

not_allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String mustn't be equal to any string in the list.
# == true: String mustn't match with all regex in the list.

regex_mode: < bool | default -> false >
# Use strings in 'allowed_values' and 'not_allowed_values' for regex matching.

regex_multiline: < bool | default -> false >
# Use multiline matching for regex tests or not

regex_fullmatch: < bool | default -> true >
# String must fully match.

Special Data Types

'fqdn' Data Type

'fqdn' Description

This data type represents a full-qualified domain name. A FQDN consits of a hostname (first label; defined by by RFC 953 and RFC 1123) and domain name (remaining labels; defined by RFC 1035 section 2.3.1.).

RFCs allows and ignores upper cases in hostnames and domain labels. This is not the default case in cd2t validation. cd2t creates a finding on upper cases. If upper cases should be ignored by cd2t, set option 'strict_lower' to false.

'fqdn' Limitations

Supported in Version 2

'fqdn' Schema

type: 'fqdn'

minimum: < 4-255 | default -> 4 >
# Minimum required length of FQDNs

maximum: < 4-255 | default -> 255 > # >= minimum
# Maximum allowed length of FQDNs

minimum_labels: < int | default -> 2 > # >= 2
# Minimum required amount of labels

maximum_labels: < int > # >= minimum
# Maximum allowed amount of labels

allowed_values:
- < string >
# FQDN must match with any regex in the list.

not_allowed_values:
- < string >
# FQDN mustn't match with all regex in the list.

strict_lower: < bool | default -> true >
# Do not allow upper cases in FQDN.

'hostname' Data Type

'hostname' Description

This data type represents a internet hostname defined by RFC 953 and RFC 1123. Thus, a hostname contains ASCII character a through z, digits 0 through 9 or hyphen-character '-'. A hostname is 1 to 63 characters long.

RFCs allows and ignores upper cases in hostnames. This is not the default case in cd2t validation. cd2t creates a finding on upper cases. If upper cases should be ignored by cd2t, set option 'strict_lower' to false.

'hostname' Limitations

Supported in Version 2

'hostname' Schema

type: 'hostname'

minimum: < 1-63 | default -> 1 >
# Minimum required length of hostname

maximum: < 1-63 | default -> 63 > # >= minimum
# Maximum allowed length of hostname

allowed_values:
- < string >
# Hostname must match with any regex in the list.

not_allowed_values:
- < string >
# Hostname mustn't match with all regex in the list.

strict_lower: < bool | default -> true >
# Do not allow upper cases in hostname.

'hostname' Validation Process

If 'strict_lower' is true: Check for upper cases in hostname
Check on allowed characters
Chech minimum and maximum length
Check on not allowed values
Check on allowed values

'ip' Data Type

'ip' Description

This data type represents an IP object. Meaning, IPv4 or IPv6 as well as address, network or interface.

'ip' Limitations

Supported in Version 2

'ip' Schema

type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
  - < IP address string >
  # Data must match one of the values.
not_allowed_values:
  - < IP address string >
  # Data mustn't match all values.

'ip_address' Data Type

'ip_address' Description

This data type represents an IP address.

'ip_address' Limitations

Supported in Version 2

'ip_address' Schema

type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
  - < IP address string >
  # Data must match one of the values.
not_allowed_values:
  - < IP address string >
  # Data mustn't match all values.
allowed_subnets:
  - < IP network string >
  # IP address must be within one of the networks
not_allowed_subnets:
  - < IP network string >
  # IP address mustn't be within any of the networks

'ip_network' Data Type

'ip_network' Description

This data type represents an IP network.

'ip_network' Limitations

Supported in Version 2

'ip_network' Schema

type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
  - < IP address string >
  # Data must match one of the values.
not_allowed_values:
  - < IP address string >
  # Data mustn't match all values.
allowed_subnets:
  - < IP network string >
  # IP address must be within one of the networks
not_allowed_subnets:
  - < IP network string >
  # IP address mustn't be within any of the networks
minimum_prefix_length: < int >
  # 0 < lenght < 32|128 (v4|v6)
maximum_prefix_length: < int >
  # 'minimum_prefix_length' < lenght < 32|128 (v4|v6)

'ip_interface' Data Type

'ip_interface' Description

This data type represents an IP interface (address with subnet prefix length, i.e. 10.1.1.21/24).

'ip_interface' Limitations

Supported in Version 2

'ip_interface' Schema

type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
  - < IP address string >
  # Data must match one of the values.
not_allowed_values:
  - < IP address string >
  # Data mustn't match all values.
allowed_subnets:
  - < IP network string >
  # IP address must be within one of the networks
not_allowed_subnets:
  - < IP network string >
  # IP address mustn't be within any of the networks

General Data Type Options

Default Value

Defining a default value for documentation purpose. I.e. if object attribute is missing.

default_value: < value >

Description

Description of the value. Can be a single string or a list of strings.

description: < string | list[string] >

Referencing

Referencing achieves two validation goals:

It can makes sure that a value is unique.
It can makes sure that a value is defined at another place within the data structure.

Basically a reference is defined by a key, so that one or more 'producers' of values can match with one or more 'consumers' of values.

In addition, many options are available to implement 1:1, 1:n or n:m relations, scope of 'reachability' of values with namespaces and more ...

Reference Options

If a data type supports referencing, these options are available.

reference:
  key: < string > # required
  # Identifier to map data at different positions in the data structure

  # Define the reference mode.
  mode: < 'unique' | 'producer' | 'consumer' | default -> 'unique' >
  # - 'producer': collect values as allowed values for 'consumer' positions.
  # - 'unique': Inherits 'producer' and checks uniqueness of the value
  #   among other values at other positions with the same key.
  # - 'consumer': data value must match to a 'producer' value.

  credits: < int >
  # - 'Producer' and 'unique' mode have infinite credits by default.
  # - If 0 with 'producer' or 'unique' mode, no 'consumer' is allowed.
  # - If >0 with 'producer' or 'unique':
  #   - all producer's credits with same value are summed up,
  #   - a consumer is allowed, when enough credits are available,
  #   - each consumer's credits are  subtracted from the producers credit sum.
  # - In 'consumer' moder the credit value is 1 by default.
  # - 'Consumer' credits must be greater than 0.

  allow_orphan_producer: < bool | default -> true >
  # If disabled, producer value without a consumer are not allowed.

  # Select the scope of the reference:
  unique_scope: < 'namespace' | 'global' | default -> 'global' >
  # Ignored in 'provider' or 'consumer' mode

  provider_scope: < 'namespace' | 'global' | default -> 'global' >
   # 'ignored in 'consumer' mode

  consumer_scope: < 'namespace' | 'global' | default -> 'global' >
  # Ignored in 'unique' or 'provider' mode
  # 'namespace' scopes to the same namespace data only.
  # References across namespaces only works,
  # if both 'ends' specify 'global'.

Feature Details

Custom Data Types

Introduced in version 2

Custom data types provides an efficient way to define data type and options, which used at multiple locations within the data structure. Data type options can be overwritten for each use.

Note: Custom data type names mustn't conflict with built-in data types. Good practice is to start with a capital letter.

Example

root:
  type: object
  attributes:
    name: string
    age: NonNegativeInt
    rate:
      - type: NonNegativeInt
        maximum: 100

custom_data_types:
  NonNegativeInt:
    type: integer
    minimum: 0

Python Code Example

import os
import yaml
from cd2t import Validator

with open('my_schema.yml') as f:
    schema = yaml.load(f)

validator = Validator()
validator.load_schema(schema)

results = []
for filename in os.listdir('./my_data_folder'):
    with open(filename) as f:
        test_data = yaml.load(f)
    validator.change_namespace(filename)
    _results = validator.validate_data(test_data)
    results.extend(_results)

_results = validator.get_reference_findings()
results.extend(_results)

print('\n'.join(results))

FAQs

What is cd2t?

Is cd2t well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install