sbimport
Installation
This tool is built using Node.js. You can install it globally using npm
or yarn
as follows:
For npm
npm install -g @sweepbright/sbimport
For yarn
yarn global add @sweepbright/sbimport
Once the process terminates, you can start using it. Let's try
sbimport --help
Another approach is to use npx
or its analog to immediately use the tool.
For npm
npx @sweepbright/sbimport --help
For yarn berry
yarn dlx @sweepbright/sbimport --help
Usage
Currently, sbimport
support two commands:
init
: to initialise the import folder
import
: to import the datasets in an existing import folder.
Let's see in more details what this commands do.
init
command
The init
command expect a folder name as an argument. When executed, it will ask you the API Key and the environment. Provide them and itβs done!
You can now inspect the folder the tools created. If you have tree
installed you should get a similar output.
sbimport init ImportCustomerX
tree ImportCustomerX
ImportCustomerX
βββ company.config.yaml
βββ labels
βββ contacts
βββ logs
βββ properties
βββ files
βββ README.txt
5 directories, 2 files
The folder is organised by entity that can be imported. Currently, we support import of properties (properties
subfolder), labels (labels
subfolder) and contacts (contacts
subfolder).
We also have:
company.config.yaml
: the configuration file that contains the information specified during the init (we reserve the possibility to expand this in the future with more options).
cache.sqlite
: The file what the tool uses to keep track of what has been imported and avoid importing the same data multiple times.
logs
: The folder where all the logs will be persisted.
import
command
The import
command expect a folder name as an argument and support a few options. To have a full overview of the available option run
sbimport import --help
The import
command will attempt to import every record it finds in the import folder provided as an argument. These behaviour can be changed by using specific options. Having the possibility to select what to import is useful when you just need to import specific records.
The sbimport
keeps track of what has been process so that consecutive execution of the same dataset will not generate unnecessary import operations. It does that by storing an hash for each record and asset in the datasets and comparing it during subsequent executions. When the comparison fail we consider the record to be imported.
All options are:
--entities
: Allow to import just the selected entity type. Available options are property
, propertyAsset
, contact
, label
. You can specify multiple entities by repeating the option.
--force
: By default the sbimport
remembers what was processed to avoid importing the same data. To prevent that use this option.
--batchSize
: The number of records to import in a single batch. Default is 100.
--files
: Number of async file operations scheduled together. Only for property assets. Default is 5.
--silent
: Do not print any output to the console. Default is false.
--maxAttempts
: The number of attempts to retry a failed request. Default is 10.
--retryWaitMs
: The time to wait between retries in milliseconds. Default is 1000. It doubles at each retry.
--uri
: The URI to the cloud storage. Itβs used to import data from a cloud storage. Read more in the Cloud import section.
Environment variables
uri | SBIMPORT_IMPORT_URI | See in the Cloud import section |
force | SBIMPORT_IMPORT_NO_CACHE | false |
silent | SBIMPORT_IMPORT_SILENT | false |
maxAttempts | SBIMPORT_IMPORT_MAX_RETRY_ATTEMPTS | 10 |
retryWaitMs | SBIMPORT_IMPORT_RETRY_WAIT_MS | 1000 |
batchSize | SBIMPORT_IMPORT_BATCH_SIZE | 100 |
files | SBIMPORT_IMPORT_ASYNC_FILE_OPERATION_COUNT | 5 |
tasks | SBIMPORT_IMPORT_ASYNC_TASKS_COUNT | 1 |
entities | SBIMPORT_IMPORT_ENTITIES | label,contact,property,propertyAsset (all) |
Examples
To import everything in the import folder test
sbimport import test
To import just properties datasets in the import folder test
sbimport import test --entities property
To import properties datasets and relative images in the import folder test
.
sbimport import test --entities property --entities propertyAsset
Preparing the datasets
The sbimport
tool expect data in JSONL format. In short, files using JSONL expect to have a valid JSON object per line.
To convert a normal JSON file to the JSONL version that the tool expect, you could use jq as follows:
jq -c '.[]' < dataset.json > dataset.jsonl
Once your datasets are ready, move them to the respective folders:
- Properties datasets in
properties
folder
- Contacts datasets in the
contacts
folder.
- Labels datasets in the
labels
folder.
Company config file
The company.config.yaml
file is used to store the configuration of the import process. Itβs a YAML file that contains the following:
env: production
key: SB************************
The key
property is used to store the API key to use to authenticate the requests.
Batch deduplication
The tool automatically deduplicate the datasets for entities which has updated_at
attribute in their schema.
The duplicates are identified by the id
attribute, and the updated_at
attribute is used to determine which record
is the most recent one to keep. The import stats don't count the duplicates in any metrics.
Batch example:
{"id": "1", "updated_at": "2024-12-01T00:00:00Z", "name": "John Doe"}
{"id": "1", "updated_at": "2024-12-02T00:00:00Z", "name": "John Doe"}
{"id": "2", "updated_at": "2024-12-01T00:00:00Z", "name": "Jane Doe"}
Preparing properties files
To support SweepBright properties file options, we decided to organise the import folder properties files using the following hierarchy (itβs also described in the Readme file inside the files
folder)
This directory is used to store the properties files.
The structure is as follows:
{property-reference}/
βββ documents/
β βββ private/
β βββ public/
βββ images
β βββ private/
β βββ public/
βββ plans
βββ private/
βββ public/
Replace {property-reference} with your property ID.
For example, assuming you are importing one property with ID a3155152-3cb3-4878-b1e6-39466844328c
, and this property has:
doc1.pdf
public and doc2.pdf
private
img1.png
public and img2.png
private
- and no plans
Important node: the order of folders and files is important because file paths are used to determine the ordinal
of the property assets.
Both local and cloud imports rely on the natural sorting of the file paths, which is alphanumeric ascending.
The files
folder should look like that
files
βββ a3155152-3cb3-4878-b1e6-39466844328c
βββ documents
βΒ Β βββ private
βΒ Β βΒ Β βββ doc2.pdf
βΒ Β βββ public
βΒ Β βββ doc1.pdf
βββ images
βΒ Β βββ private
βΒ Β βΒ Β βββ img2.png
βΒ Β βββ public
βΒ Β βββ img1.png
βββ plans
βββ private
βββ public
Cloud import
The sbimport
tool can be used to import data from a cloud storage. The tool supports the following cloud storage:
To use the cloud import feature, you need to provide additional environment variables or CLI options.
URI to the cloud storage
- Argument:
--uri
- Environment variable:
SBIMPORT_URI
AWS S3 | https://<BUCKET_NAME>.s3.<REGION>.amazonaws.com/ |