
Security Fundamentals
Turtles, Clams, and Cyber Threat Actors: Shell Usage
The Socket Threat Research Team uncovers how threat actors weaponize shell techniques across npm, PyPI, and Go ecosystems to maintain persistence and exfiltrate data.
repository: https://gitlab.com/ko.no/cd2t
Data schema describes the expected structure and the expected values of the data, which to be validated. It also includes schema meta data and control options.
version: 2
name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> true >
# Shortcuts are Data Type strings instead of dictionaries.
# Corresponding data type with default options is used.
root: { data type schema }
custom_data_types:
< data type name >:
type: < built-in data type name > # required; customizable data type
< build-in data type option >: < value >
# Templates can be used in data schema below 'root' (1).
# Or below any other template if used in a child data type (2) or
# directly if data type options are changed(2).
# Example (1):
# anywhere_below_root:
# template: template1
# minimum: 1 # Overwrite or add minimum from/in template schema
# Example (2):
# templates:
# template1:
# type: integer
# minimum: 100
# maximum: 1000
# template2:
# type: multitype
# types:
# - type: string
# - template: template1
# Example (3):
# templates:
# template1:
# type: integer
# minimum: 100
# maximum: 1000
# template2:
# template: template1
# minimum: 200
templates:
< template name >: { data schema } # no data type shortcut allowed.
# Global options while merging template schema with local schema.
template_merge_options:
# merge dictionaries in data type options recursively.
# If false, local data options overwrites template option completely.
recursive: < bool | default -> true >
# How lists in data type options should be merged.
list_merge: < append | append_rp | prepend | prepend_rp | replace
| default -> append_rp >
name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> false >
# Shortcuts are Data Type strings instead of dictionaries.
# Corresponding data type with default options is used.
root: { data type schema }
subschemas:
< sub schema name >: { data type schema }
This data type represents any data. The validator stops further data validation or autogeneration.
type: 'any' # If type is omitted, validator uses Any Data Type
description: < str > # Information about data type and value
This data type represents a boolean values (true/false).
type: 'bool'
description: < str > # Information about data type and value
allowed_value: < bool > # true or false
autogenerate: < bool | default -> false >
# Autogenerate the default value, if data is not existing.
# Requires 'autogenerate_default'
autogenerate_default: < bool > # true or false; must match 'allowed_value' (if set)
This data type represents a selection of allowed values.
type: 'enum'
description: < str > # Information about data type and value
allowed_values: # required
- < value >
This data type represents float values.
None.
type: 'float'
description: < str > # Information about data type and value
reference: { unique options }
maximum: < float >
# value must be lower or equal to this
minimum: < float >
# value must be greater or equal to this
maximum_decimals: < int > # >= 0
# Maximum allowed decimal places.
allowed_values:
- < float > # value must match this value
- round: < int > # value rounded to < int > digits must match 'matches'
matches: < float >
- range_start: < float > # 'range_start' <= value <= 'range_end'
range_end: < float >
# List of directives which values must match.
not_allowed_values:
- < float > # value mustn't match this value
- round: < int > # value rounded to < int > digits mustn't match 'matches'
matches: < float >
- range_start: < float > # value < 'range_start' and value > 'range_end'
range_end: < float >
# List of directives which values mustn't match.
autogenerate: < bool | default -> false >
# uses 'autogenerate_default' value.
#
# OR
#
# try for 'autogenerate_random_tries' times:
# 1. Create a random float value, which is within the 'autogenerate_ranges'
# or 'minimum' <= random value <= 'maximum'
# 2. Check if random value passes the validation process.
autogenerate_default: < float >
# Autogenerate uses this value.
autogenerate_random_tries: < int | default 10 > # 0 < x < 50
# Ignored, if 'autogenerate_default' is set.
# Maximum amount of tries to find a random float value,
# which is not used by any reference.
# Integer value must be greater than 0 and lower than 50.
autogenerate_ranges:
- minimum: < float >
maximum: < float >
# Ignored, if 'autogenerate_default' is set.
# '[.]minimum' <= '[.]maximum'
# Autogenerated float is within the ranges.
# If omitted, global 'minimum' and 'maximum' limits the random value.
autogenerate_random_decimals: < int | default 2 > # >= 0
# Ignored, if 'autogenerate_default' is set.
# Limit the decimal places for the random value.
If options are missing, corresponding checks are skipped.
This data type represents a dictionary, where keys are IDs. IDs can be strings or integer.
type: 'idlist'
description: < str > # Information about data type and value
reference: { unique options }
# Note: Every ID is referenced as a value with the 'reference.key'.
minimum: <int | default -> 0 > # >= 0
# Minimum required amount of IDs
maximum: < int > # >= 0
# Maximum allowed amount of IDs
# If omitted, even an empty idlist is allowed.
elements: { data type schema } # required
# Data schema defining element data type
id: { data type schema }
# Data schema defining IDs
# If omitted, ID type defaults to 'string'.
type: 'idlist'
description: < str > # Information about data type and value
reference: { unique options }
# Note: Every ID is referenced as a value with the 'reference.key'.
minimum: <int | default -> 0 > # >= 0
# Minimum required amount of IDs
maximum: < int > # >= 0
# Maximum allowed amount of IDs
# If omitted, even an empty idlist is allowed.
elements: { data type schema } # required
# Data schema defining element data type
id_type: < 'integer' | 'string' | default -> 'string' >
# Indicates if IDs are integer or string
id_minimum: <int | default -> 0 > # >= 0
# Minimum required ID string length or minimum ID integer value
id_maximum: < int > # >= 0
# Maximum required ID string length or maximum ID integer value
# If omitted, even '' is allowed as ID string.
allowed_ids:
- < string | integer >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is allowed.
not_allowed_ids:
- < string >
# List of regex strings or integers - depending on 'id_type'
# If ID matches any of it, the ID is not allowed.
# 'not_allowed_ids' are test before 'allowed_ids'.
This data type represents integer values.
None.
type: 'integer'
description: < str > # Information about data type and value
reference: { unique options }
maximum: < int >
# value must be lower or equal to this
minimum: < int >
# value must be greater or equal to this
not_allowed_values:
- < int >
# List of integers which values mustn't match.
autogenerate: < bool | default -> false >
# Requires 'reference.key' to be defined and not ''.
# If no unique value could be generated, autogeneration fails.
autogenerate_default: < int >
# Generate this integer value if value is None.
# Ignores all other 'autogen' options
autogenerate_maximum: < int >
# Autogenerated integer must be lower or equals to this.
# If omitted, 'maximum' key is upper limit
autogenerate_minimum: < int >
# Autogenerated integer must be greater or equals to this.
# If omitted, 'minimum' key is lower limit
autogenerate_find: < 'next_higher' | 'next_lower' |
'random' | default -> 'next_higher' >
# Ignored, if 'autogenerate_default' is set.
# Tells autogenerate to try first available integer value
# starting at 'minimum' and increasing ('next_higher'),
# starting at 'maximum' and decreasing ('next_lower') or
# picking a random number within 'minimum' and 'maximum'.
This data type represents a list of same data types. If different data types are allowed in the list, use data type 'multitype' as elements.
type: 'list' # required
description: < str > # Information about data type and value
elements: { data type schema } # required
# Data schema defining elements data type
minimum: <int | default -> 0 > # >= 0
# Minimum required amount of elements in the list.
maximum: < int > # >= 0
# Maximum allowed amount of elements in the list.
allow_duplicates: < bool | default -> true >
# Allow same element data multiple times
This data type represents a selection of allowed data types.
type: 'multitype'
description: < str > # Information about data type and value
types: # required
- { data type schema }
# List of data type schemas.
This data type represents a none or null value.
type: 'none'
description: < str > # Information about data type and value
This data type represents an object with attributes. Technically its a dictionary in Python. Attributes of the object are keys in the dictionary.
type: 'object'
description: < str > # Information about data type and value
attributes:
< attribute_name >: { data type schema }
# Mapping with key as attribute name and value as data type schema.
# If omitted, any data which is an dictionary is accepted.
required_attributes:
- < attribute_name >
# List of attribute names, which must be in the object.
required_xor_attributes:
- - < attribute_name>
# List of attribute name lists,
# where one (and only one) attribute must be in the object.
ignore_undefined_attributes: < bool | default -> false >
# Tell validator to ignore attributes in data object,
# which are not defined in 'attributes'.
dependencies:
< attribute_name >:
requires:
- < attribute_name >
# List of attribute names, which must be in the object,
# if this attribute is in.
excludes:
- < attribute_name >
# List of attribute names, which must not be in the object,
# if this attribute is in.
allow_regex_attributes: < bool | default -> False >
# If enabled, regular expressions are allowed in:
# 'attributes': If object attribute name matches, schema is verified.
# 'required_attributes': Each element must have a at least one matching
# attribute name.
# 'dependencies.<>.requires': Successful if any object attribute name matches
# each list entry.
# 'dependencies.<>.excludes': Error if any object attribute name matches any
# list entry.
# !!! Disables autogeneration of missing keys !!!
autogenerate: < bool | default -> True >
# Enable/Disable autogeneration of missing attributes,
# if 'allow_regex_attributes' == false and
# attribute's data type supports autogeneration and is defined within.
reference:
{ reference options }
# The validator checks, if the same combination of attribute values
# is specified at another data type with the same 'reference.key'.
attributes:
- < attribute_name >
# List of attribute names, which values should be combined uniqueness check.
type: 'object'
description: < str > # Information about data type and value
attributes:
< attribute_name >: { data type schema }
# Mapping with key as attribute name and value as data type schema.
# If omitted, any data which is an dictionary is accepted.
required_attributes:
- < attribute_name >
# List of attribute names, which must be in the object.
ignore_undefined_attributes: < bool | default -> false >
# Tell validator to ignore attributes in data object,
# which are not defined in 'attributes'.
dependencies:
< attribute_name >:
requires:
- < attribute_name >
# List of attribute names, which must be in the object,
# if this attribute is in.
excludes:
- < attribute_name >
# List of attribute names, which must not be in the object,
# if this attribute is in.
allow_regex_attributes: < bool | default -> False >
# If enabled, regular expressions are allowed in:
# 'attributes': If object attribute name matches, schema is verified.
# 'required_attributes': Each element must have a at least one matching
# attribute name.
# 'dependencies.<>.requires': Successful if any object attribute name matches
# each list entry.
# 'dependencies.<>.excludes': Error if any object attribute name matches any
# list entry.
# !!! Disables autogeneration of missing keys !!!
autogenerate: < bool | default -> True >
# Enable/Disable autogeneration of missing attributes,
# if 'allow_regex_attributes' == false and
# attribute's data type supports autogeneration and is defined within.
reference: { reference options }
# The validator checks, if the same combination of attribute values
# is specified at another data type with the same 'reference.key'.
# Requires 'reference_attributes' to be defined.
reference_attributes:
- < attribute_name >
# List of attribute names, which values should be combined uniqueness check.
This data type does not represents an expected data value. It uses a subschema's root data type to process the data structure.
type: 'schema'
description: < str > # Information about data type and value
subschema: < str >
# Name of the subschema, which is defined under 'subschemas' in schema.
This data type represents a string.
type: 'string'
description: < str > # Information about data type and value
reference:
# global reference options plus:
allow_namespace_lookups: < bool > # Only valid for 'consumer' mode
namespace_separator_char: < string >
# Process:
# - Check if *namespace_separator_char* is in string value
# - extract namespace from left part of first finding
# - extract value from right part of first finding
# - lookup for *provider value* == *extracted value* in
# namespace *extracted namespace* for reference.key
minimum: <int | default -> 0 > # >= 0
# Minimum required string length
maximum: < int > # >= 0
# Maximum allowed string length
allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String must be equal to any string in the list.
# == true: String must match with any regex in the list.
not_allowed_values:
- < string >
# List of strings
# Dependis on 'regex_mode':
# == false: String mustn't be equal to any string in the list.
# == true: String mustn't match with all regex in the list.
regex_mode: < bool | default -> false >
# Use strings in 'allowed_values' and 'not_allowed_values' for regex matching.
regex_multiline: < bool | default -> false >
# Use multiline matching for regex tests or not
regex_fullmatch: < bool | default -> true >
# String must fully match.
This data type represents a full-qualified domain name. A FQDN consits of a hostname (first label; defined by by RFC 953 and RFC 1123) and domain name (remaining labels; defined by RFC 1035 section 2.3.1.).
RFCs allows and ignores upper cases in hostnames and domain labels. This is not the default case in cd2t validation. cd2t creates a finding on upper cases. If upper cases should be ignored by cd2t, set option 'strict_lower' to false.
type: 'fqdn'
minimum: < 4-255 | default -> 4 >
# Minimum required length of FQDNs
maximum: < 4-255 | default -> 255 > # >= minimum
# Maximum allowed length of FQDNs
minimum_labels: < int | default -> 2 > # >= 2
# Minimum required amount of labels
maximum_labels: < int > # >= minimum
# Maximum allowed amount of labels
allowed_values:
- < string >
# FQDN must match with any regex in the list.
not_allowed_values:
- < string >
# FQDN mustn't match with all regex in the list.
strict_lower: < bool | default -> true >
# Do not allow upper cases in FQDN.
This data type represents a internet hostname defined by RFC 953 and RFC 1123. Thus, a hostname contains ASCII character a through z, digits 0 through 9 or hyphen-character '-'. A hostname is 1 to 63 characters long.
RFCs allows and ignores upper cases in hostnames. This is not the default case in cd2t validation. cd2t creates a finding on upper cases. If upper cases should be ignored by cd2t, set option 'strict_lower' to false.
type: 'hostname'
minimum: < 1-63 | default -> 1 >
# Minimum required length of hostname
maximum: < 1-63 | default -> 63 > # >= minimum
# Maximum allowed length of hostname
allowed_values:
- < string >
# Hostname must match with any regex in the list.
not_allowed_values:
- < string >
# Hostname mustn't match with all regex in the list.
strict_lower: < bool | default -> true >
# Do not allow upper cases in hostname.
This data type represents an IP object. Meaning, IPv4 or IPv6 as well as address, network or interface.
type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
This data type represents an IP address.
type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
This data type represents an IP network.
type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
minimum_prefix_length: < int >
# 0 < lenght < 32|128 (v4|v6)
maximum_prefix_length: < int >
# 'minimum_prefix_length' < lenght < 32|128 (v4|v6)
This data type represents an IP interface (address with subnet prefix length, i.e. 10.1.1.21/24).
type: ip
version: < 4 | 6 >
loopback: < bool > # true: value must be loopback; false: value mustn't be loopback
link_local: < bool > # true: value must be link-local; false: value mustn't be link-local
private: < bool > # true: value must be private; false: value mustn't be private
public: < bool > # true: value must be public; false: value mustn't be public
multicast: < bool > # true: value must be multicast; false: value mustn't be multicast
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
Defining a default value for documentation purpose. I.e. if object attribute is missing.
default_value: < value >
Description of the value. Can be a single string or a list of strings.
description: < string | list[string] >
Referencing achieves two validation goals:
Basically a reference is defined by a key, so that one or more 'producers' of values can match with one or more 'consumers' of values.
In addition, many options are available to implement 1:1, 1:n or n:m relations, scope of 'reachability' of values with namespaces and more ...
If a data type supports referencing, these options are available.
reference:
key: < string > # required
# Identifier to map data at different positions in the data structure
# Define the reference mode.
mode: < 'unique' | 'producer' | 'consumer' | default -> 'unique' >
# - 'producer': collect values as allowed values for 'consumer' positions.
# - 'unique': Inherits 'producer' and checks uniqueness of the value
# among other values at other positions with the same key.
# - 'consumer': data value must match to a 'producer' value.
credits: < int >
# - 'Producer' and 'unique' mode have infinite credits by default.
# - If 0 with 'producer' or 'unique' mode, no 'consumer' is allowed.
# - If >0 with 'producer' or 'unique':
# - all producer's credits with same value are summed up,
# - a consumer is allowed, when enough credits are available,
# - each consumer's credits are subtracted from the producers credit sum.
# - In 'consumer' moder the credit value is 1 by default.
# - 'Consumer' credits must be greater than 0.
allow_orphan_producer: < bool | default -> true >
# If disabled, producer value without a consumer are not allowed.
# Select the scope of the reference:
unique_scope: < 'namespace' | 'global' | default -> 'global' >
# Ignored in 'provider' or 'consumer' mode
provider_scope: < 'namespace' | 'global' | default -> 'global' >
# 'ignored in 'consumer' mode
consumer_scope: < 'namespace' | 'global' | default -> 'global' >
# Ignored in 'unique' or 'provider' mode
# 'namespace' scopes to the same namespace data only.
# References across namespaces only works,
# if both 'ends' specify 'global'.
Introduced in version 2
Custom data types provides an efficient way to define data type and options, which used at multiple locations within the data structure. Data type options can be overwritten for each use.
Note: Custom data type names mustn't conflict with built-in data types. Good practice is to start with a capital letter.
root:
type: object
attributes:
name: string
age: NonNegativeInt
rate:
- type: NonNegativeInt
maximum: 100
custom_data_types:
NonNegativeInt:
type: integer
minimum: 0
import os
import yaml
from cd2t import Validator
with open('my_schema.yml') as f:
schema = yaml.load(f)
validator = Validator()
validator.load_schema(schema)
results = []
for filename in os.listdir('./my_data_folder'):
with open(filename) as f:
test_data = yaml.load(f)
validator.change_namespace(filename)
_results = validator.validate_data(test_data)
results.extend(_results)
_results = validator.get_reference_findings()
results.extend(_results)
print('\n'.join(results))
FAQs
cd2t validates data structure, data types and values with templates
We found that cd2t demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security Fundamentals
The Socket Threat Research Team uncovers how threat actors weaponize shell techniques across npm, PyPI, and Go ecosystems to maintain persistence and exfiltrate data.
Security News
At VulnCon 2025, NIST scrapped its NVD consortium plans, admitted it can't keep up with CVEs, and outlined automation efforts amid a mounting backlog.
Product
We redesigned our GitHub PR comments to deliver clear, actionable security insights without adding noise to your workflow.