Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

platform-agent-import-astra-data-lake

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

platform-agent-import-astra-data-lake

An AWS Batch agent to create the Glue Catalog for the Astra Data Lake and populate the lake with data.

2.7.1-dev.0
latest
Source
npm

Version published: 5 years ago

Weekly downloads: 1; decreased by-50%

Maintainers: 1

Weekly downloads

Created: 5 years ago

Source

platform-agent-import-astra-data-lake

An AWS Batch agent to create the Glue Catalog for the Astra Data Lake and populate the lake with data.

Prerequisites

Install Node.js and Node Package Manager (npm) here.
Install your favorite text/code editor. I highly recommend Visual Studio Code as a lightweight, flexible, and extensible code editor. You can download it here.
Set the required environment variables. This can be done by running the setup-dev-environment.sh script in the pipeline/ directory.

Getting Started

npm install
npm test

For dev deployments, you can setup necessary environment variables using the following script:

# Note: requires aws cli to be installed on your machine
. ./pipeline/lookup-ecr-uri.sh

Deployment Guide

Creation of the ECS Repository for this agent is a once-per-account deployment. The steps are documented here by the serverless/cloudformation YAML, but have been pushed up to AWS via serverless deploy from a developer machine rather than building a pipeline for a one-time action. Deployment occurs normally via the CI pipeline once this configuration is complete using the /agent-import-astra-data-lake/ECR-URI SSM Parameter.

pushd pipeline/ecr
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
npm run deploy
# Note that a serverless plugin handles looking up the ECR URI and storing it in SSM
popd

Creation of the Agent User for this agent is a once-per-account deployment. The steps are documented here by the serverless/cloudformation YAML, but have been pushed up to AWS via serverless deploy from a developer machine rather than building a pipeline for a one-time action. Deployment occurs normally via the CI pipeline once this configuration is complete.

pushd pipeline/user
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
npm run deploy
popd

Also once-per-account, you must alter the ecsInstanceRole permissions to grant the secretsmanager:GetSecretValue permission on the agent-import-astra-data-lake resource. To do this, go into the IAM Management Console, open the 'Roles' page, search for 'ecsInstanceRole' and open the Role. Under 'Permissions', expand the 'data-ingestion-secrets-read' Policy and edit it to add:

"arn:aws:secretsmanager:*:*:secret:/agent-import-astra-data-lake/*"

to the Resources array of the policy.

Batch Job Queue

Creation of the Batch Job Queue and an SSM parameter that saves the job queue name must be done once per account-stage (so twice if you want a separate staging environment), and is handled by the project in the pipeline/job-queue folder. Consumers requiring the job queue name can reference it by looking up the /platform-agent-import-adl/job-queue-name-$STAGE_DATA_LAKE SSM parameter.

For Dev account deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=dev
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=dev
./node_modules/.bin/serverless deploy
popd

For Prod account 'staging' stage deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=staging
./node_modules/.bin/serverless deploy
popd

For Prod account 'prod' stage deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=prod
./node_modules/.bin/serverless deploy
popd

Execution of the batch

For executing this batch it requires the following parameters:

TenantId: This is the tenant ID of the customer that you are trying to execute the batch for
QueryExecutionIds: Here you pass in a JSON string
AstraDataLakeOutputLocation: This will be the output location for the Astra Data. Enter it in the form of <S3Bucket>/<S3Key>, eg. astra-data-lake-dev/dev/AstraData. If you want it to be in the root level of the bucket, then only pass the name of the bucket
HealthCheck: If this parameter is set to true then only a check will be performed to make sure that the batch is operational. No further processing is done. To generate the astra-data-lake, set this value to false
TreatWarningsAsErrors: This is an optional parameter, and is not currently getting used. Pass true for now in this argument.

Dependencies

mocha

Mocha is a simple and flexible JavaScript test framework. It can be used for BDD, TDD, and other testing types and provides many features for synchronous and asynchronous testing.

chai

Chai is a BDD/TDD assertion library. It provides the standard asserts as well as "should" and "expect" style asserts for BDD language.

aws-sdk

aws-sdk provides an interface for Amazon Web Services such as Step Functions and Batch.

FAQs

What is platform-agent-import-astra-data-lake?

Is platform-agent-import-astra-data-lake popular?

Is platform-agent-import-astra-data-lake well maintained?

Package last updated on 26 Mar 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install