πŸš€ Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more β†’
Socket
DemoInstallSign in
Socket

@sweepbright/sbimport

Package Overview
Dependencies
Maintainers
1
Versions
72
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@sweepbright/sbimport

The dataset import tool that interfaces with the Properties API and the Contacts API

1.3.7
latest
npm
Version published
Weekly downloads
1.9K
0.48%
Maintainers
1
Weekly downloads
Β 
Created
Source

sbimport

Installation

This tool is built using Node.js. You can install it globally using npm or yarn as follows:

For npm

npm install -g @sweepbright/sbimport

For yarn

yarn global add @sweepbright/sbimport

Once the process terminates, you can start using it. Let's try

sbimport --help

Another approach is to use npx or its analog to immediately use the tool.

For npm

npx @sweepbright/sbimport --help

For yarn berry

yarn dlx @sweepbright/sbimport --help

Usage

Currently, sbimport support two commands:

  • init: to initialise the import folder
  • import: to import the datasets in an existing import folder.

Let's see in more details what this commands do.

init command

The init command expect a folder name as an argument. When executed, it will ask you the API Key and the environment. Provide them and it’s done!

You can now inspect the folder the tools created. If you have tree installed you should get a similar output.

sbimport init ImportCustomerX

tree ImportCustomerX

ImportCustomerX
β”œβ”€β”€ company.config.yaml
β”œβ”€β”€ labels
β”œβ”€β”€ contacts
β”œβ”€β”€ logs
└── properties
    └── files
        └── README.txt

5 directories, 2 files

The folder is organised by entity that can be imported. Currently, we support import of properties (properties subfolder), labels (labels subfolder) and contacts (contacts subfolder).

We also have:

  • company.config.yaml: the configuration file that contains the information specified during the init (we reserve the possibility to expand this in the future with more options).
  • cache.sqlite: The file what the tool uses to keep track of what has been imported and avoid importing the same data multiple times.
  • logs: The folder where all the logs will be persisted.

import command

The import command expect a folder name as an argument and support a few options. To have a full overview of the available option run

sbimport import --help

The import command will attempt to import every record it finds in the import folder provided as an argument. These behaviour can be changed by using specific options. Having the possibility to select what to import is useful when you just need to import specific records.

The sbimport keeps track of what has been process so that consecutive execution of the same dataset will not generate unnecessary import operations. It does that by storing an hash for each record and asset in the datasets and comparing it during subsequent executions. When the comparison fail we consider the record to be imported.

All options are:

  • --entities: Allow to import just the selected entity type. Available options are property, propertyAsset, contact, label. You can specify multiple entities by repeating the option.
  • --force: By default the sbimport remembers what was processed to avoid importing the same data. To prevent that use this option.
  • --batchSize: The number of records to import in a single batch. Default is 100.
  • --files: Number of async file operations scheduled together. Only for property assets. Default is 5.
  • --silent: Do not print any output to the console. Default is false.
  • --maxAttempts: The number of attempts to retry a failed request. Default is 10.
  • --retryWaitMs: The time to wait between retries in milliseconds. Default is 1000. It doubles at each retry.
  • --uri: The URI to the cloud storage. It’s used to import data from a cloud storage. Read more in the Cloud import section.

Environment variables

CLI optionEnvironment variableDefaults
uriSBIMPORT_IMPORT_URISee in the Cloud import section
forceSBIMPORT_IMPORT_NO_CACHEfalse
silentSBIMPORT_IMPORT_SILENTfalse
maxAttemptsSBIMPORT_IMPORT_MAX_RETRY_ATTEMPTS10
retryWaitMsSBIMPORT_IMPORT_RETRY_WAIT_MS1000
batchSizeSBIMPORT_IMPORT_BATCH_SIZE100
filesSBIMPORT_IMPORT_ASYNC_FILE_OPERATION_COUNT5
tasksSBIMPORT_IMPORT_ASYNC_TASKS_COUNT1
entitiesSBIMPORT_IMPORT_ENTITIESlabel,contact,property,propertyAsset (all)

Examples

To import everything in the import folder test

sbimport import test

To import just properties datasets in the import folder test

sbimport import test --entities property

To import properties datasets and relative images in the import folder test.

sbimport import test --entities property --entities propertyAsset

Preparing the datasets

The sbimport tool expect data in JSONL format. In short, files using JSONL expect to have a valid JSON object per line.

To convert a normal JSON file to the JSONL version that the tool expect, you could use jq as follows:

jq -c '.[]' < dataset.json > dataset.jsonl

Once your datasets are ready, move them to the respective folders:

  • Properties datasets in properties folder
  • Contacts datasets in the contacts folder.
  • Labels datasets in the labels folder.

Company config file

The company.config.yaml file is used to store the configuration of the import process. It’s a YAML file that contains the following:

env: production
key: SB************************

The key property is used to store the API key to use to authenticate the requests.

Batch deduplication

The tool automatically deduplicate the datasets for entities which has updated_at attribute in their schema. The duplicates are identified by the id attribute, and the updated_at attribute is used to determine which record is the most recent one to keep. The import stats don't count the duplicates in any metrics.

Batch example:

{"id": "1", "updated_at": "2024-12-01T00:00:00Z", "name": "John Doe"} // This will be excluded 
{"id": "1", "updated_at": "2024-12-02T00:00:00Z", "name": "John Doe"} // This record will be kept
{"id": "2", "updated_at": "2024-12-01T00:00:00Z", "name": "Jane Doe"}

Preparing properties files

To support SweepBright properties file options, we decided to organise the import folder properties files using the following hierarchy (it’s also described in the Readme file inside the files folder)

This directory is used to store the properties files.

The structure is as follows:

{property-reference}/
    β”œβ”€β”€ documents/
    β”‚   β”œβ”€β”€ private/
    β”‚   └── public/
    β”œβ”€β”€ images
    β”‚   β”œβ”€β”€ private/
    β”‚   └── public/
    └── plans
        β”œβ”€β”€ private/
        └── public/

Replace {property-reference} with your property ID.

For example, assuming you are importing one property with ID a3155152-3cb3-4878-b1e6-39466844328c, and this property has:

  • doc1.pdf public and doc2.pdf private
  • img1.png public and img2.png private
  • and no plans

Important node: the order of folders and files is important because file paths are used to determine the ordinal of the property assets. Both local and cloud imports rely on the natural sorting of the file paths, which is alphanumeric ascending.

The files folder should look like that

files
└── a3155152-3cb3-4878-b1e6-39466844328c
    β”œβ”€β”€ documents
    β”‚Β Β  β”œβ”€β”€ private
    β”‚Β Β  β”‚Β Β  └── doc2.pdf
    β”‚Β Β  └── public
    β”‚Β Β      └── doc1.pdf
    β”œβ”€β”€ images
    β”‚Β Β  β”œβ”€β”€ private
    β”‚Β Β  β”‚Β Β  └── img2.png
    β”‚Β Β  └── public
    β”‚Β Β      └── img1.png
    └── plans
        β”œβ”€β”€ private
        └── public

Cloud import

The sbimport tool can be used to import data from a cloud storage. The tool supports the following cloud storage:

  • AWS S3

To use the cloud import feature, you need to provide additional environment variables or CLI options.

URI to the cloud storage

  • Argument: --uri
  • Environment variable: SBIMPORT_URI
Cloud storageURI format
AWS S3https://<BUCKET_NAME>.s3.<REGION>.amazonaws.com/

FAQs

Package last updated on 25 Mar 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts