Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

platform-agent-import-astra-data-lake

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

platform-agent-import-astra-data-lake

An AWS Batch agent to create the Glue Catalog for the Astra Data Lake and populate the lake with data.

  • 2.7.1-dev.0
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
1
decreased by-50%
Maintainers
1
Weekly downloads
 
Created
Source

platform-agent-import-astra-data-lake

Coverage Status

An AWS Batch agent to create the Glue Catalog for the Astra Data Lake and populate the lake with data.

Prerequisites

  1. Install Node.js and Node Package Manager (npm) here.
  2. Install your favorite text/code editor. I highly recommend Visual Studio Code as a lightweight, flexible, and extensible code editor. You can download it here.
  3. Set the required environment variables. This can be done by running the setup-dev-environment.sh script in the pipeline/ directory.

Getting Started

npm install
npm test

For dev deployments, you can setup necessary environment variables using the following script:

# Note: requires aws cli to be installed on your machine
. ./pipeline/lookup-ecr-uri.sh

Deployment Guide

Creation of the ECS Repository for this agent is a once-per-account deployment. The steps are documented here by the serverless/cloudformation YAML, but have been pushed up to AWS via serverless deploy from a developer machine rather than building a pipeline for a one-time action. Deployment occurs normally via the CI pipeline once this configuration is complete using the /agent-import-astra-data-lake/ECR-URI SSM Parameter.

pushd pipeline/ecr
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
npm run deploy
# Note that a serverless plugin handles looking up the ECR URI and storing it in SSM
popd

Creation of the Agent User for this agent is a once-per-account deployment. The steps are documented here by the serverless/cloudformation YAML, but have been pushed up to AWS via serverless deploy from a developer machine rather than building a pipeline for a one-time action. Deployment occurs normally via the CI pipeline once this configuration is complete.

pushd pipeline/user
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
npm run deploy
popd

Also once-per-account, you must alter the ecsInstanceRole permissions to grant the secretsmanager:GetSecretValue permission on the agent-import-astra-data-lake resource. To do this, go into the IAM Management Console, open the 'Roles' page, search for 'ecsInstanceRole' and open the Role. Under 'Permissions', expand the 'data-ingestion-secrets-read' Policy and edit it to add:

"arn:aws:secretsmanager:*:*:secret:/agent-import-astra-data-lake/*"

to the Resources array of the policy.

Batch Job Queue

Creation of the Batch Job Queue and an SSM parameter that saves the job queue name must be done once per account-stage (so twice if you want a separate staging environment), and is handled by the project in the pipeline/job-queue folder. Consumers requiring the job queue name can reference it by looking up the /platform-agent-import-adl/job-queue-name-$STAGE_DATA_LAKE SSM parameter.

For Dev account deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=dev
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=dev
./node_modules/.bin/serverless deploy
popd

For Prod account 'staging' stage deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=staging
./node_modules/.bin/serverless deploy
popd

For Prod account 'prod' stage deployment:

pushd pipeline/job-queue
npm install
export AWS_PROFILE=prod
export AWS_REGION=us-east-1
export STAGE_DATA_LAKE=prod
./node_modules/.bin/serverless deploy
popd

Execution of the batch

For executing this batch it requires the following parameters:

  • TenantId: This is the tenant ID of the customer that you are trying to execute the batch for
  • QueryExecutionIds: Here you pass in a JSON string
  • AstraDataLakeOutputLocation: This will be the output location for the Astra Data. Enter it in the form of <S3Bucket>/<S3Key>, eg. astra-data-lake-dev/dev/AstraData. If you want it to be in the root level of the bucket, then only pass the name of the bucket
  • HealthCheck: If this parameter is set to true then only a check will be performed to make sure that the batch is operational. No further processing is done. To generate the astra-data-lake, set this value to false
  • TreatWarningsAsErrors: This is an optional parameter, and is not currently getting used. Pass true for now in this argument.

Dependencies

mocha

Mocha is a simple and flexible JavaScript test framework. It can be used for BDD, TDD, and other testing types and provides many features for synchronous and asynchronous testing.

chai

Chai is a BDD/TDD assertion library. It provides the standard asserts as well as "should" and "expect" style asserts for BDD language.

aws-sdk

aws-sdk provides an interface for Amazon Web Services such as Step Functions and Batch.

FAQs

Package last updated on 26 Mar 2020

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc