Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
quality-lac-data-validator
Advanced tools
Shared module for validating the ruleset on the SSDA903 census using DfE rules.
We want to build a tool that improves the quality of data on Looked After Children so that Children’s Services Departments have all the information needed to enhance their services.
We believe that a tool that highlights and helps fixing data errors would be valuable for:
The aim of this project is to deliver a tool to relieve some of the pain-points of reporting and quality in children's services data. This project focuses, in particular, on data on looked after children (LAC) and the SSDA903 return.
The project consists of a number of related pieces of work:
The core parts consist of a Python validator engine and rules using Pandas with Poetry for dependency management. The tool is targeted to run either standalone, or in pyodide in the browser for a zero-install deployment with offline capabilities.
It provides methods of finding the validation errors defined by the DfE in 903 data. The validator needs to be provided with a set of input files for the current year and, optionally, the previous year. These files are coerced into a common format and sent to each of the validator rules in turn. The validators report on rows not meeting the rules and a report is provided highlight errors for each row and which fields were included in the checks.
These are the key files
project
├─── pyproject.toml - Project details and dependencies
├─── validator903
│ ├─── config.py - High-level configuration
│ ├─── ingress.py - Data ingress (handling CSV and XML files)
│ ├─── types.py - Classes used across the work
│ ├─── validator.py - The core validator process
│ └─── validators.py - All individual validator codes
└─── tests - Unit tests
Most of the work from contributors will be in validators.py
and the associated testing files under
tests. Please do not submit a pull-request without a comprehensive test.
To install the code and dependencies, from the main project directory run:
poetry install
If this does not work, it might be because you're running the wrong version of Python, the version of Numpy used by the 903 validator is locked at 3.9. The devcontainer and dockerfile should ensure you are running 3.9 and you may simply require a rebuild. If not, ensure you are working in an environment or venv with Python 3.9 as your interpreter.
Validators are simple functions, usually called validate_XXX()
which take no arguments and
return a tuple of an ErrorDefinition
and a test function. The test function itself takes
a single argument, the datastore, which is a Mapping (a dict-like) following the structure below.
The following is the expected structure for the input data that is given to each validator (the dfs
object).
You should assume that not all of these keys are present and handle that appropriately.
Any XML uploads are converted into CSV form to give the same inputs.
{
# This years data
'Header': # header dataframe
'Episodes': # episodes dataframe
'Reviews': # reviews dataframe
'UASC': # UASC dataframe
'OC2': # OC2 dataframe
'OC3': # OC3 dataframe
'AD1': # AD1 dataframe
'PlacedAdoption': # Placed for adoption dataframe
'PrevPerm': # Previous permanence dataframe
'Missing': # Missing dataframe
# Last years data
'Header_last': # header dataframe
'Episodes_last': # episodes dataframe
'Reviews_last': # reviews dataframe
'UASC_last': # UASC dataframe
'OC2_last': # OC2 dataframe
'OC3_last': # OC3 dataframe
'AD1_last': # AD1 dataframe
'PlacedAdoption_last': # Placed for adoption dataframe
'PrevPerm_last': # Previous permanence dataframe
'Missing_last': # Missing dataframe
# Metadata
'metadata': {
'collection_start': # A datetime with the collection start date (year/4/1)
'collection_end': # A datetime with the collection end date (year + 1/4/1)
'postcodes': # Postcodes dataframe, columns laua, oseast1m, osnrth1m, pcd
'localAuthority: # The local authority code entered (long form, e.g. E07000026)
'collectionYear': # The raw collection year string - unlikely to need this (e.g. '2019/20')
}
}
To build and release a new version, make sure all your unit tests pass.
We use semantic versioning, so update the project version in pyproject.toml accordingly
and commit, creating a PR. Once the release version is on GitHub, create a GitHub release naming the release with the
current release name, e.g. 1.0 and the tag with the release name prefixed with a v, i.e. v1.0. Alpha and beta releases
can be flagged by appending -alpha.<number>
and -beta.<number>
.
FAQs
Shared module for validating the ruleset on the SSDA903 census using DfE rules.
We found that quality-lac-data-validator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.