MCA CLI
CLI to help automating MCA work
Installation
- Clone repository
- Run
npm link
Local development
- Watch and compile on change
npm run start
- Compile typescript to javascript with
npm run build
- Run built mca cli with
./dist/bin/mca.js
Folder organization
- src (Contains source files)
- bin (Starting point for cli app)
- cmd (Command line configs using yargs commandDir)
- lib (Code for to commands)
- assets (Assets required for the command line)
- dist (Build folder, same as src but with js files)
Linting
- Lint code with
npm run lint
- Fix linter errors with
npm run lint:fix
Testing
- Run tests with
npm run test
Unit tests should be in the same location as the code with added spec.ts
extension. Larger integration tests should be separated to test folder.
Release
Run npm run release
to make version bump, add tags and update CHANGELOG automatically.
What is mca-cli?
mca-cli and mca-monitoring are used together to setup monitoring for resources in an AWS environment.
The mca monitoring
command searches for resources in an AWS environment. It generates a Node project inside your main project folder and creates a config.yml, which lists all the resources, along with some default alarms.
How does monitoring work in AWS?
It is a combination of CloudWatch metrics and CloudWatch alarms.
A metric is a statistic. For example: AWS Lambda has an Invocations metric, which counts the number of times a function is invoked.
An alarm observes a single metric and initiates actions when a specified condition is met. The action could be sending a notification to a SNS topic.
Setting up default monitoring
1. Generate the monitoring folder
In the root project folder run
npx mca monitoring init -p <aws profile> -r <aws region> -o monitoring
Optional flags
--service
-
a space seperated list of service names to include in the search for resources. By default all resources are included:
- lambda
- dynamodb
- ecs
- apigateway
- cloudfront
- rds
- eks
- loggroup
- appsync
- sqs
--include
: A list of regex patterns of resource names (or ids) to include in the monitoring By default all resources are included. Resources are identified by:
- (lambda) function name
- (dynamodb) table name
- (ecs) cluster name
- (apigateway) api name
- (cloudfront) distribution id or alias
- (rds) db instance identifier
- (eks) cluster name
- (appsync) api name
- (sqs) queue name
--exclude
: Same as above, but resources are excluded.
--help
See all options
2. Install NPM packages
In the monitoring folder run npm install
3. Deploy
In the monitoring folder run npm run deploy
***it
Customizing monitoring
Read more about CloudWatch concepts
Custom configurations should be listed in config.yml, under custom > default.
Configurable properties
- enabled Boolean.
Whether to create an alarm for this metric. - autoResolve Boolean
(optional, default: false), Should the alarm automatically enter “OK” state. -
alarm
-
critical
- comparisonOperator String
(optional, default: GREATER_THAN_OR_EQUAL_TO_THRESHOLD). Comparison to use to check if metric is breaching. (available values) - threshold Number
(required). The value against which the specified statistic is compared. - evaluationPeriods Nubmer
(required). The number of periods over which data is compared to the specified threshold. - evaluateLowSampleCountPercentile Percentile
(optional). Used only for alarms that are based on percentiles. Specifies whether to evaluate the data and potentially change the alarm state if there are too few data points to be statistically significant. - treatMissingData String
(optional, default: NOT_BREACHING). Sets how this alarm is to handle missing data points. (available values)
-
metric
-
period
(optional, default: 5 minutes) The period over which the specified statistic is applied. Can have one of the following sub properties:
- milliseconds Number
- seconds Number
- minutes Number
- hours Number
- days Number
- isoString ISO 8601
- statisticString
(required, one of: Minimum, Maximum, Average, Sum, SampleCount, pNN.NN). What function to use for aggregating. - unitString
(optional, default: undefined). Unit used to filter the metric stream. Only useful when datums are being emitted to the same metric stream under different units.
Log groups specific properties
-
[custom metric name]
-
filter
-
pattern String
(required).
Filter pattern syntax.
When using quotes for exact matches (e.g. “[ERROR]"), put single + double quotes (e.g. '"[ERROR]"'), or mca-monitoring will end up with a regex (e.g. [ERROR]).
Config.yml example
cli:
version: 1
services:
- lambda
- dynamodb
- apigateway
- cloudfront
- loggroup
includes: []
excludes:
- '*ee*'
- '*rapsiapp*'
- '*dev*'
- '*marketprice*'
- '*warmup*'
- '*error-handler*'
profile: nc-personal-user
custom:
default:
lambda:
Errors:
enabled: true
autoresolve: false
alarm:
critical:
comparisonOperator: GREATER_THAN_OR_EQUAL_TO_THRESHOLD
threshold: 1
evaluationPeriods: 1
metric:
period:
minutes: 15
statistic: Minimum
cloudfront:
4XXErrorRate:
enabled: false
logGroup:
RuntimeErrors:
enabled: true
alarm:
critical:
threshold: 1
evaluationPeriods: 1
metric:
period:
minutes: 5
unit: Count
statistic: Sum
filter:
pattern: ERROR -400 -401 -403 -404 -Timeout -DeprecationWarning
snsTopic:
critical:
name: Topic for mca monitoring alarms
id: avena-alerts-alarm
endpoints:
- >-
https://events.pagerduty.com/integration/58287e69892c4406aa88db8619721142/enqueue
emails: []
lambdas:
myTestLambda: {}
distributions:
E2K3LH1G46OF18: {}
E3ADB61RBHAPW9: {}
E35IJ0HST9PMZQ: {}
logGroups:
/aws/lambda/avenakauppa-fi-analysis-prod-get-analysis: {}
/aws/lambda/avenakauppa-fi-analysis-prod-post-analysis: {}
Config types and metrics
lambda
- Invocations
- Errors
- DeadLetterErrors
- DestinationDeliveryFailures
- Throttles
- ProvisionedConcurrencyInvocations
- ProvisionedConcurrencySpilloverInvocations
- Duration
- IteratorAge
- ConcurrencyExecutions
- ProvisionedConcurrencyExecutions
- ProvisionedConcurrencyUtilizations
- UnreservedConcurrentExecutions
table
- ConditionalCheckFailedRequests
- ConsumedReadCapacityUnits
- ConsumedWriteCapacityUnits
- MaxProvisionedTableReadCapacityUtilization
- MaxProvisionedTableWriteCapacityUtilization
- OnlineIndexConsumedWriteCapacity
- OnlineIndexPercentageProgress
- OnlineIndexThrottleEvents
- PendingReplicationCount
- ProvisionedReadCapacity
- ProvisionedWriteCapacity
- ReadThrottleEvents
- ReplicationLatency
- ReturnedBytes
- ReturnedItemCount
- ReturnedRecordsCount
- SystemErrors
- TimeToLiveDeletedItemCount
- ThrottledRequests
- TransactionConflict
- WriteThrottleEvents
account
This is part of the AWS/DynamoDB namespace
clusters
- CPUReservation
- CPUUtilization
- MemoryReservation
- MemoryUtilization
- GPUReservation
apiGateway
- 4XXError
- 5XXError
- CacheHitCount
- CacheMissCount
- Count
- IntegrationLatency
- Latency
cloudfront
- 4XXErrorRate
- 5XXErrorRate
- 401ErrorRate
- 403ErrorRate
- 404ErrorRate
- 502ErrorRate
- 503ErrorRate
- 504ErrorRate
- BytesDownloaded
- BytesUploaded
- CacheHitRate
- OriginLatency
- Requests
- TotalErrorRate
rds
- BinLogDiskUsage
- BurstBalance
- CPUUtilization
- CPUCreditUsage
- CPUCreditBalance
- DatabaseConnections
- DiskQueueDepth
- FailedSQLServerAgentJobsCount
- FreeableMemory
- FreeStorageSpace
- MaximumUsedTransactionIDs
- NetworkReceiveThroughput
- NetworkTransmitThroughput
- OldestReplicationSlotLag
- ReadIOPS
- ReadLatency
- ReadThroughput
- ReplicaLag
- ReplicationSlotDiskUsage
- SwapUsage
- TransactionLogsDiskUsage
- TransactionLogsGeneration
- WriteIOPS
- WriteLatency
- WriteThrougput
eks
- cluster_failed_node_count
- cluster_node_count
- namespace_number_of_running_pods
- node_cpu_limit
- node_cpu_reserved_capacity
- node_cpu_usage_total
- node_cpu_utilization
- node_filesystem_utilization
- node_memory_limit
- node_memory_reserved_capacity
- node_memory_utilization
- node_memory_working_set
- node_network_total_bytes
- node_number_of_running_containers
- node_number_of_running_pods
- pod_cpu_reserved_capacity
- pod_cpu_utilization
- pod_cpu_utilization_over_pod_limit
- pod_memory_reserved_capacity
- pod_memory_utilization
- pod_memory_utilization_over_pod_limit
- pod_number_of_container_restarts
- pod_network_rx_bytes
- pod_network_tx_bytes
- service_number_of_running_pods
appSyncApi
- 4XXError
- 5XXError
- Latency
- ConnectSuccess
- ConnectClientError
- ConnectServerError
- DisconnectSuccess
- DisconnectClientError
- DisconnectServerError
- SubscribeSuccess
- SubscribeClientError
- SubscribeServerError
- UnsubscribeSuccess
- UnsubscribeClientError
- UnsubscribeServerError
- PublishDataMessageSuccess
- PublishDataMessageClientError
- PublishDataMessageServerError
- PublishDataMessageSize
- ActiveConnection
- ActiveSubscription
- ConnectionDuration
sqs
- ApproximateAgeOfOldestMessage
- ApproximateNumberOfMessagesDelayed
- ApproximateNumberOfMessagesNotVisible
- ApproximateNumberOfMessagesVisible
- NumberOfEmptyReceives
- NumberOfMessagesDeleted
- NumberOfMessagesReceived
- NumberOfMessagesSent
- SentMessageSize