Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
datapackage-pipelines-fiscal
Advanced tools
Extension for datapackage-pipelines used for loading Fiscal Data Packages into:
babbage
model will also be generated and written to the datapackage for querying the database using its APIThis extension works with a custom source spec and a set of processors. The generator will convert the source spec into a set of inter-dependent pipelines, which when run in order will perform data processing and loading to selected endpoints (based on environment variables).
This extension is used by os-conductor and os-data-importers.
DPP_DB_ENGINE
- connection string for an SQL database to dump data into
ELASTICSEARCH_ADDRESS
[OPTIONAL] - connection string for an elasticsearch instance (used for package registry updating)
S3_BUCKET_NAME
[OPTIONAL] - S3 bucket for uploading data. If not provided, local ZIP files will be created instead.
AWS_ACCESS_KEY_ID
- S3 credentials (required if S3 bucket was specified)
AWS_SECRET_ACCESS_KEY
- S3 credentials (required if S3 bucket was specified)
In order to fully run the fiscal datapackage flow you need to have os-types
installed, using npm:
$ npm install -g os-types
This external node.js utility is used to perform fiscal modelling for the processed datapackage.
Each source-spec contains information regarding a single Fiscal Data Package.
Top level properties are:
title
Title, or Display name, of the data package
dataset-name
[OPTIONAL]A slug to be used as the data package's name.
If not provided, a slugified version of the title will be used.
resource-name
[OPTIONAL]A slug to be used as the main resource's name in the final data package.
If not provided, the dataset name will be used.
owner-id
The id of the owner of this datapackage.
This identifier is used to generate various paths and storage names.
sources
Contains a non-empty list of data sources for the fiscal data package.
Each data source has these properties:
url
: The location of the dataname
: [OPTIONAL] A name for this source (will later be used as an intermediate resource name)Other tabulator
parameters can also be added as properties here, e.g. sheet
, encoding
, compression
etc.
fields
Contains a non-empty list of fields for the fiscal data package.
Each field definition has these properties:
header
: The name
of the field in the resulting resourcetitle
[OPTIONAL]: The display name of the field in the resulting resourcecolumnType
: The ColumnType of the fieldoptions
: Extra options to be added to the field, e.g. json-table-schema properties such as decimalChar
etc.measures
[OPTIONAL]Extra information for measure normalization processing. (Measure normalization is the process of reducing the number of measures to one while multipltying the number of rows and adding extra columns to contain values for identifying the original measure).
Contains the following sub-properties:
currency
: The currency code of the output measure columntitle
[OPTIONAL]: The title for the output measure columnmapping
: Unpivoting map.The unpivoting map is a map from a measure's name to its unpivoting data.
"Unpivoting data" is a map from an extra column's name to a value
Example:
measures:
currency: GTQ
mapping:
APPROVED:
PHASE_ID: "0"
PHASE: Inicial
RELEASED:
PHASE_ID: "1"
PHASE: Vigente
COMMITTED:
PHASE_ID: "2"
PHASE: Comprometido
currencies
[OPTIONAL]: List of currency codes to convert to ('USD' by default).
See next section for detailscurrency-conversion
[OPTIONAL]Instructions for adding an extra column or columns with measure values in another currency.
date_measure
[OPTIONAL]: Column name from which a date can be extracted.
If not provided, a guess will be made according to the ColumnType.
title
[OPTOINAL]: Title for the currency-converted measure columns.
datapackage-url
[OPTIONAL]Contains the URL for a source datapackage from which this data came from. If provided, metadata for this datapackage will be loaded from this URL.
deduplicate
[OPTIONAL]If true
, then the source data will be processed to remove duplicate rows (i.e. rows which have the same values in the primary key). Measure values for these rows will be summed in order to generate a single output row.
postprocessing
[OPTIONAL]A list of extra processors (and parameters) that will be applied to the data.
Format is as in any pipeline-spec.yaml
suppress-os
[OPTIONAL, default is False
]If False
, an OpenSpending compatible datapackage is created on the datastore. This basic datapackage ensures a basic FDP is available for editing with OpenSpending. Packages created with os-conductor
already create this artefact, so would use suppress-os: True
, to prevent another being created unnecessarily.
keep-artifacts
[OPTIONAL, default is False
]By default, pipeline artifacts (temporary directories and files creating during pipeline execution) will be removed after all pipelines have successfully been run. To keep the artifact, set this option to True
.
Outputs:
(depends on ./denormalized_flow
)
(depends on ./finalize_datapackage_flow_splitter
)
splitter
pipeline as well as the full denormalized datasetOutputs:
(depends on ./denormalized_flow
)
Outputs:
(depends on ./denormalized_flow
and all ./dimension_flow_{hierarchy}
)
Outputs:
(depends on corresponding ./dimension_flow_{hierarchy}
)
Outputs
(depends on ./normalized_flow
)
Outputs
(depends on ./dumper_flow
)
Outputs
Please read the contribution guideline:
FAQs
Fiscal Data Package extensions for Datapackage Pipelines
We found that datapackage-pipelines-fiscal demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.