
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
This microservice is a multi-mode file movement wizard. It can transfer files, at a scheduled interval, between two different storage devices, using different transfer protocols and storage types.
The application is built for Python 3, but also tested against Python 2.7. It is not compatible with Python 2.6.
Note The application will not work if you have files with spaces in the names and will fail.
Standard FTP is no longer supported in this app
The application should be installed using pip3
(or pip
for Python 2.7).
To install from a private PyPI server we suggest using ~/.pypirc
to configure
your private PyPI connection details::
pip3 install data-transfer --extra-index-url <Repo-URL>
After installing and setting the configuration settings, the application can be started with the following command::
data-transfer
Start by cloning the project::
git clone git@github.com:UKHomeOffice/data-transfer.git
Ensure that python3
is installed and on your path
.
Installing for local development
These steps will install the application as a local pip installed package, using symlinks so that any updates you make to the files are automatically picked up next time you run the application or tests.
Using venv """"""""""
To install the app using the standard python3 venv
run the following
commands from the project root folder::
python3 -m venv ~/.virtualenvs/data-transfer
source ~/.virtualenvs/data-transfer/bin/activate
pip3 install -e . -r requirements.txt
export PYTHONPATH=.
Using virtualenvwrapper """""""""""""""""""""""
Alternatively, if you are using virtualenvwrapper
then run the following::
mkvirtualenv data-transfer -p python3
pip3 install -e . -r requirements.txt
export PYTHONPATH=.
Dependancies for local testing """"""""""""""""""""""""""""""
The project's tests require the following dependencies:
For local development and testing, we suggest running Docker images. The following will meet the test dependencies and match the default env vars::
docker run -d --name s3server -p 8000:8000 scality/s3server
docker run -d --name ftp_server -p 21:21 -p 30000-30009:30000-30009 onekilo79/ftpd_test
docker run -p 2222:22 -d atmoz/sftp foo:pass:::upload
Test """"
Once the application is installed and the dependencies are in place, run the tests::
pytest tests
This project uses setuptools
to build the distributable package.
Remember to update the version
in setup.py
before building the package::
python setup.py sdist
This will create a .tar.gz
distributable package in dist/
. This should be
uploaded to an appropriate PyPI registry.
The application should be installed using pip3
(or pip
for Python 2.7).
If installing from a private PyPI server then we suggest using ~/.pypirc
to
configure your private PyPI connection details::
pip3 install data-transfer --extra-index-url <Repo-URL>
The application requires the following environment variables to be set before running.
All configuration settings automatically default to suitable values for the tests, based on the local test dependencies running in the Docker images suggested in this guide.
Application settings """"""""""""""""""""
These control various application behaviours, where a variable is not required the default value is used:
+---------------------+----------------------+-----------+-----------------------------------+ |Environment Variable | Example (Default) | Required | Description. | +=====================+======================+===========+===================================+ |INGEST_SOURCE_PATH | /upload/files | Yes | Source path | +---------------------+----------------------+-----------+-----------------------------------+ |INGEST_DEST_PATH | /upload/files/done | Yes | Destination path | +---------------------+----------------------+-----------+-----------------------------------+ |MAX_FILES_BATCH | 5 | No | Number to process each run | +---------------------+----------------------+-----------+-----------------------------------+ |PROCESS_INTERVAL | 5 | No | Runs the task every (x) seconds. | +---------------------+----------------------+-----------+-----------------------------------+ |FOLDER_DATE_OUTPUT | False | No | Moves files to YYYY / MM / DD | +---------------------+----------------------+-----------+-----------------------------------+ |LOG_LEVEL | INFO | No | Log level | +---------------------+----------------------+-----------+-----------------------------------+ |LOG_FILE_NAME | data-transfer.log | Yes | Filename for log output | +---------------------+----------------------+-----------+-----------------------------------+ |USE_IAM_CREDS | False | Yes | Indicates to app to use IAM | +---------------------+----------------------+-----------+-----------------------------------+ |READ_STORAGE_TYPE | See footnote | Yes | The type of read storage | +---------------------+----------------------+-----------+-----------------------------------+ |WRITE_STORAGE_TYPE | See footnote | Yes | The type of write storage | +---------------------+----------------------+-----------+-----------------------------------+
Note: the read and write storage types need to be prefixed and options are:
datatransfer.storage.FolderStorage
datatransfer.storage.SftpStorage
datatransfer.storage.S3Storage
Also ensure that the source and destination paths have the correct leading and trailing slashes, this will depend on the storage type and the OS. See the ecosystem.config file for examples.
Source / read settings """"""""""""""""""""""
Provide the connection settings for either sFTP or S3. You only need to configure the settings associated with the source storage type.
+----------------------------+------------------------+--------------------------+ |Environment Variable | Example | Description | +============================+========================+==========================+ |READ_FTP_HOST | localhost | Hostname or IP of server | +----------------------------+------------------------+--------------------------+ |READ_FTP_PASSWORD | pass | Password | +----------------------------+------------------------+--------------------------+ |READ_FTP_USER | user | Username | +----------------------------+------------------------+--------------------------+ |READ_FTP_PORT | 22 | Port the server uses | +----------------------------+------------------------+--------------------------+ |READ_AWS_ACCESS_KEY_ID | accessKey1 | Access key for S3 | +----------------------------+------------------------+--------------------------+ |READ_AWS_S3_BUCKET_NAME | aws-ingest | Bucket name | +----------------------------+------------------------+--------------------------+ |READ_AWS_S3_HOST | http://localhost:8000 | URL of S3 | +----------------------------+------------------------+--------------------------+ |READ_AWS_S3_REGION | eu-west-1 | region for s3 bucket | +----------------------------+------------------------+--------------------------+
Target / write settings """""""""""""""""""""""
Provide the connection settings for either sFTP or S3. You only need to configure the settings associated with the target storage type.
+----------------------------+-----------------------+-------------------------+ |Environment Variable | Example | Description | +============================+=======================+=========================+ |WRITE_FTP_HOST | localhost | Hostname or IP of server| +----------------------------+-----------------------+-------------------------+ |WRITE_FTP_USER | user | Username | +----------------------------+-----------------------+-------------------------+ |WRITE_FTP_PASSWORD | pass | Password | +----------------------------+-----------------------+-------------------------+ |WRITE_FTP_PORT | 22 | Port for server | +----------------------------+-----------------------+-------------------------+ |WRITE_AWS_ACCESS_KEY_ID | accesskey1 | Access key for S3 | +----------------------------+-----------------------+-------------------------+ |WRITE_AWS_SECRET_ACCESS_KEY | verysecret | Secrey key | +----------------------------+-----------------------+-------------------------+ |WRITE_AWS_S3_BUCKET_NAME | aws-ingest | Bucket name | +----------------------------+-----------------------+-------------------------+ |WRITE_AWS_S3_HOST | http://localhost:8000 | URL of S3 | +----------------------------+-----------------------+-------------------------+ |WRITE_AWS_S3_REGION | eu-west-1 | region for s3 bucket | +----------------------------+-----------------------+-------------------------+
To run the application from the command line:
For pip installed versions::
data-transfer
Calling the application directly::
python bin/data-transfer
For production use we recommend running the application using PM2, please ensure that PM2 is installed globally before running this command::
pm2 start ecosystem.config.js --only data-transfer
Envirnment variables required should be changed in the ecosystem file before running PM2. It is also recommended to run pm2 from within a python virtual env.
To run more that one instance of the application with different config settings, you will need to change/add additional services into the ecosystem config file.
See here for examples:
http://pm2.keymetrics.io/docs/usage/application-declaration/#process-file
The application is portable between linux and windows, however when running the app on windows there are some specifics you may want to take into account:
If you are running the microservice using a batch file or other mechanism other than PM2, you will need to ensure that the environment variables are set without quotes.
The file paths for FolderStorage should be Windows paths, for FTP,sFTP and S3 these can be unix format.
For sFTP, and Folder storage ensure paths are absolute without a trailing slash /path/to/something
For S3 the path is used with the URL so can be relative, but without a trailing slash path/to/something
If you are running the app on a AWS instance that has anIAM policy you can set the USE_IAM_CREDS var to True and the application will use IAM policies. You must however ensure that the bucket name is set correctly.
Contributing """"""""""""
This project is Open source and we welcome ocntributions to and suggestions to improve the application. Please raise issues in the usual way on Github and for contributing code:
Licensing """""""""
This application is released under the BSD license
_.
.. _BSD license: LICENSE.txt
FAQs
A data movement app that can use different source/targets tomove data around.
We found that data-transfer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.