Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Python command line tool and python engine to label table fields and fields in data files.
It could help to find meaningful data in your tables and data files or to find Personal identifable information (PII).
To install Python library use pip install metacrafter
via pip or python setup.py install
Metacrafter is a rule based tool that helps to label fields of the tables in databases. It scans table and finds person names, surnames, midnames, PII data, basic identifiers like UUID/GUID.
These rules written as .yaml files and could be easily extended.
File formats supported:
CSV
JSON lines
JSON (array of records)
BSON
Parquet
XML
Databases support:
Any SQL database supported by SQLAlchemy
NoSQL databases:
Metacrafter key features:
111 labeling rules
all labels metadata collected into Metacrafter registry public repository
312 date detection rules/patterns, date detection using qddate, "quick and dirty" date detection library
extendable set of rules using PyParsing, exact text match and validation functions
support any database supported by SQLAlchemy
advanced context and language management. You could apply only rules relevant to certain data of choosen language
built-in API server
commercial support and additional rules available
# Scan CSV file
$ metacrafter scan-file --format short somefile.csv
# Scan CSV file with delimiter ';' and windows-1251 encoding
$ metacrafter scan-file --format short --encoding windows-1251 --delimiter ';' somefile.csv
# Scan JSON lines file, output results as stats table to file file
$ metacrafter scan-file --format stats -o somefile_result.json somefile.jsonl
Result example of 'full' type of formatting
key ftype tags matches datatype_url
---------------- ------- ------ --------------------------------------------------------------------- ----------------------------------------------------------
Domain str fqdn 99.90 https://registry.apicrafter.io/datatype/fqdn
Primary domain str fqdn 100.00 https://registry.apicrafter.io/datatype/fqdn
Name str name 100.00 https://registry.apicrafter.io/datatype/name
Domain type str dict
Organization str
Status str dict
Region str dict rusregion 22.95 https://registry.apicrafter.io/datatype/rusregion
GovSystem str dict
HTTP Support str dict boolean 100.00 https://registry.apicrafter.io/datatype/boolean
HTTPS Support str dict boolean 100.00 https://registry.apicrafter.io/datatype/boolean
Statuscode str dict
Is archived str empty
Archives str empty
Archive priority str dict
Archive Strategy str dict
ASN str asn 93.77 https://registry.apicrafter.io/datatype/asn
ASN Country code str dict countrycode_alpha2 100.00,countrycode_alpha2 100.00,languagetag 99.56 https://registry.apicrafter.io/datatype/countrycode_alpha2
IPs str ipv4 96.28 https://registry.apicrafter.io/datatype/ipv4
GovType str dict
# Scan MongoDB database 'fns', save results as result.json and format output as 'stats'
$ metacrafter scan-mongodb --dbname fns -o result.json -f full
# Scan Postgres database 'dbname', with schema 'public'.
$ metacrafter scan-db --schema public --connstr postgresql+psycopg2://username:password@127.0.0.1:15432/dbname
All rules described as YAML files and by default rules loaded from directory 'rules' or from list of directories provided in .metacrafter file with YAML format
All rules could be applied to fields or data .
Compare engines defined in match parameter in rule description:
text - scan text for exact match to one of text values. Text values delimited by comma (',')
ppr - scan text for PyParsing. PyParsing rule defined as Python code with PyParsing objects like Word(nums, exact=4)
func - scan text using Python function provided. Function shoud accept one string parameter and shoud return True or False
Example Russian administrative legal act/law matched by custom function
runpabyfunc:
key: runpa
name: Russian legal act / law
maxlen: 500
minlen: 3
priority: 1
match: func
type: data
rule: metacrafter.rules.ru.gov.is_ru_law
Example midname matching by exact field name
midname:
key: person_midname
name: Person midname by known
rule: midname,secondname,middlename,mid_name,middle_name
type: field
match: text
Example Russian cadastral number
rukadastr:
key: rukadastr
name: Russian land territory cadastral identifier
rule: Word(nums, min=1, max=2) + Literal(':').suppress() + Word(nums, min=1, max=2) + Literal(':').suppress() + Word(nums, min=6, max=7) + Literal(':').suppress() + Word(nums, min=1, max=6)
maxlen: 20
minlen: 12
priority: 1
match: ppr
type: data
Rule types:
field based rules 146
data based rules 102
Context:
common 47
companies 15
crypto 3
datetime 29
finances 5
geo 58
government 19
identifiers 3
industry 2
internet 18
medical 6
objectids 3
persons 19
pii 16
science 2
software 1
values 1
vehicles 1
Language:
common 100
de 4
en 24
es 1
fr 11
ru 108
Data/time patterns (qddate): 312
Please write ibegtin@apicrafter.io or ivan@begtin.tech to request beta access to commercial API.
Commercial API support 195 fields and data rules and provided with dedicated support.
FAQs
Metacrafter metadata classification tool
We found that metacrafter demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.