Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
oc-pipelinewise-tap-postgres
Advanced tools
Singer.io tap for extracting data from PostgresSQL - PipelineWise compatible
Singer tap that extracts data from a PostgreSQL database and produces JSON-formatted data following the Singer spec.
This is a PipelineWise compatible tap connector.
The recommended method of running this tap is to use it from PipelineWise. When running it from PipelineWise you don't need to configure this tap with JSON files and most of things are automated. Please check the related documentation at Tap Postgres
If you want to run this Singer Tap independently please read further.
First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.
It's recommended to use a virtualenv:
python3 -m venv venv
pip install pipelinewise-tap-postgres
or
make venv
{
"host": "localhost",
"port": 5432,
"user": "postgres",
"password": "secret",
"dbname": "db"
}
These are the same basic configuration properties used by the PostgreSQL command-line client (psql
).
Full list of options in config.json
:
Property | Type | Required? | Default | Description |
---|---|---|---|---|
host | String | Yes | - | PostgreSQL host |
port | Integer | Yes | - | PostgreSQL port |
user | String | Yes | - | PostgreSQL user |
password | String | Yes | - | PostgreSQL password |
dbname | String | Yes | - | PostgreSQL database name |
filter_schemas | String | No | None | Comma separated schema names to scan only the required schemas to improve the performance of data extraction. |
ssl | String | No | None | If set to "true" then use SSL via postgres sslmode require option. If the server does not accept SSL connections or the client certificate is not recognized the connection will fail. |
logical_poll_total_seconds | Integer | No | 10800 | Stop running the tap when no data received from wal after certain number of seconds. |
break_at_end_lsn | Boolean | No | true | Stop running the tap if the newly received lsn is after the max lsn that was detected when the tap started. |
max_run_seconds | Integer | No | 43200 | Stop running the tap after certain number of seconds. |
debug_lsn | String | No | None | If set to "true" then add _sdc_lsn property to the singer messages to debug postgres LSN position in the WAL stream. |
tap_id | String | No | None | ID of the pipeline/tap |
itersize | Integer | No | 20000 | Size of PG cursor iterator when doing INCREMENTAL or FULL_TABLE |
default_replication_method | String | No | None | Default replication method to use when no one is provided in the catalog (Values: LOG_BASED , INCREMENTAL or FULL_TABLE ) |
use_secondary | Boolean | No | False | Use a database replica for INCREMENTAL and FULL_TABLE replication |
secondary_host | String | No | - | PostgreSQL Replica host (required if use_secondary is True ) |
secondary_port | Integer | No | - | PostgreSQL Replica port (required if use_secondary is True ) |
limit | Integer | No | None | Adds a limit to INCREMENTAL queries to limit the number of records returns per run |
tap-postgres --config config.json --discover # Should dump a Catalog to stdout
tap-postgres --config config.json --discover > catalog.json # Capture the Catalog
Each entry under the Catalog's "stream" key will need the following metadata:
{
"streams": [
{
"stream_name": "my_topic"
"metadata": [{
"breadcrumb": [],
"metadata": {
"selected": true,
"replication-method": "LOG_BASED",
}
}]
}
]
}
The replication method can be one of FULL_TABLE
, INCREMENTAL
or LOG_BASED
.
Note: Log based replication requires a few adjustments in the source postgres database, please read further for more information.
tap-postgres --config config.json --catalog catalog.json
The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json
parameter
to the tap for the next sync.
PostgreSQL databases running PostgreSQL versions 9.4.x or greater. To avoid a critical PostgreSQL bug, use at least one of the following minor versions:
A connection to the master instance. Log-based replication will only work by connecting to the master instance.
wal2json plugin: To use Log Based for your PostgreSQL integration, you must install the wal2json plugin version >= 2.3. The wal2json plugin outputs JSON objects for logical decoding, which the tap then uses to perform Log-based Replication. Steps for installing the plugin vary depending on your operating system. Instructions for each operating system type are in the wal2json’s GitHub repository:
postgres config file: Locate the database configuration file (usually postgresql.conf
) and define
the parameters as follows:
wal_level=logical
max_replication_slots=5
max_wal_senders=5
Restart your PostgreSQL service to ensure the changes take effect.
Note: For max_replication_slots
and max_wal_senders
, we’re defaulting to a value of 5.
This should be sufficient unless you have a large number of read replicas connected to the master instance.
Existing replication slot: Log based replication requires a dedicated logical replication slot. In PostgreSQL, a logical replication slot represents a stream of database changes that can then be replayed to a client in the order they were made on the original server. Each slot streams a sequence of changes from a single database.
Login to the master instance as a superuser and using the wal2json
plugin, create a logical replication slot:
SELECT *
FROM pg_create_logical_replication_slot('pipelinewise_<database_name>', 'wal2json');
Note: Replication slots are specific to a given database in a cluster. If you want to connect multiple databases - whether in one integration or several - you’ll need to create a replication slot for each database.
make venv
You can make use of the local docker-compose to spin up a test database by running make start_db
Test objects will be created in the postgres
database.
make unit_test
make integration_test
Install python dependencies and run python linter
make venv
make pylint
FAQs
Singer.io tap for extracting data from PostgresSQL - PipelineWise compatible
We found that oc-pipelinewise-tap-postgres demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.