tg-wrap
This app simply wraps terragrunt (which is a wrapper around terraform, which is a wrapper around cloud APIs, which is...).
Wait, why on earth do we need a wrapper for a wrapper (for a wrapper)?
Well, first of all it is pretty opinionated so what works for us, doesn't necessarily work for you.
But our reasoning for creating this is as follows:
1. Less typing
terraform is great, and in combination with terragrunt even greater! But let's face it, terragrunt does not excel in conciseness! The options are pretty long, which leads to lots of typing. We don't like typing!
2. Testing modules locally
However, more importantly, we are heavily utilising TERRAGRUNT_SOURCE when developing.
The thing is that as long as you use run-all
you can use one setting for that variable (and conveniently set it as an environment variable), while if you run a regular command, you need to specify the full path. Which is obviously different for each project.
Which leads to (even) more typing, and worse: a higher chance for errors.
Luckily you can use run-all
and add the appriopriate flags to ensure it behaves like a regular plan|apply|destroy etc. But again, more typing.
Nothing a bunch a aliases can't solve though!
3. But the original reason was: Errors when using run-all are challenging
One of the main boons of terragrunt is the ability to break up large projects in smaller steps while still retaining the inter-dependencies. However, when working on such a large project and something goes wrong somewhere in the middle is pretty challenging.
terragrunt's error messages are pretty massive, and this is extrapolated with every individual project in your dependency chain.
And if it fails somewhere at the front, it keeps on trying until the last one, blowing up your terminal in the process.
So we wanted a possibility to run the projects step by step, using the dependency graph of terragrunt and have a bit more control over it.
And when you can run it step by step, you can make the process also re-startable, which is also pretty handy!
And this was not something a bunch of aliases could solve, hence this wrapper was born. And while we we're at it, replacing the aliases with this was then pretty straightforward next step as well.
4. Analyzing plan files
When using the run-all, analyzing what is about to be changed is not going to be easier. Hence we created the tgwrap analyze
function that lists all the planned changes and (if a config file is availabe) calculates a drift score and runs a terrasafe style validation check.
you can ignore minor changes, such as tag updates, with tgwrap analyze -i tags
It needs a config file as follows:
---
low:
azuread_application.: {}
azuread_app_role_assignment:
drift_impact:
delete: minor
medium:
azurerm_data_factory_linked_service_key_vault.: {}
high:
azuread_group.:
drift_impact:
create: minor
update: minor
azurerm_application_insights.: {}
Speeding up the performance of analyze
This analyze
function turned out to be pretty slow, where most of the time went into the terragrunt show
function that is executed for each individual module.
This was a bit surprising as the plan file is already available on the file system, but it turns out that terragrunt is taking quite a bit of time for managing the depdencies. Even when you're excluding the external dependencies and are located in a particular module.
So, if you add the following to your root terragrunt.hcl
:
terraform {
after_hook "link_to_current_module" {
commands = ["init", "plan", "apply", "validate", "destroy"]
execute = ["bash", "-c", "ln -sf $(pwd) ${get_terragrunt_dir()}/.terragrunt-cache/current"]
}
}
The directory where the plan file is stored (including the other resources that terraform needs) becomes predictable and it becomes possible to run a native terraform show
(instead terragrunt show
) which dramatically speed up things.
Just set the proper value as an environment variable:
export TGWRAP_PLANFILE_DIR=".terragrunt-cache/current"
Or pass it along with the --planfile-dir|-P
option and it will use that.
Logging the results
tgwrap
supports logging the analyze results to an Azure Log Analytics custom table.
For that, the custom table need to be present, including a data collection endpoint and associated data collection rule.
When you want to activate this, just pass --data-collection-endpoint
(or, more conveniently, set the TGWRAP_ANALYZE_DATA_COLLECTION_ENDPOINT
environment variable) with the url to which the data can be posted.
Note that for this to work, tgwrap
assumes that there is a functioning azure cli available on the system.
A payload as below will be posted, and the log analytics table should be able to accomodate for that:
[
{
"scope": "terragrunt/dlzs/data-platform/global/platform/rbac/",
"principal": "myself",
"repo": "https://gitlab.com/my-git-repo.git",
"creations": 0,
"updates": 0,
"deletions": 0,
"minor": 0,
"medium": 0,
"major": 0,
"unknown": 0,
"total": 0,
"score": 0.0,
"details": [
{
"drifts": {
"minor": 0,
"medium": 0,
"major": 0,
"unknown": 0,
"total": 0,
"score": 0.0
},
"all": [],
"creations": [],
"updates": [],
"deletions": [],
"unauthorized": [],
"unknowns": [],
"module": ""
}
]
}
]
The log analytics (custom) table should have a schema that is able to cope with the message above:
Field | Type |
---|
creations | Int |
deletions | Int |
details | Dynamic |
major | Int |
medium | Int |
minor | Int |
principal | String |
repo | String |
scope | String |
score | Int |
TimeGenerated | Datetime |
total | Int |
unknown | Int |
updates | Int |
More than a wrapper
Over time, tgwrap became more than a wrapper, blantly violating #1 of the unix philosophy: 'Make each program do one thing well'.
For instance, the 'analyze' functionality is already an example, but more features such as deploying a landing zone has crept into the application. It makes sense for how we're using it, but we're fully aware this makes it less generically applicable.
Usage
# general help
tgwrap --help
tgwrap run -h
tgwrap run-all -h
# run a plan
tgwrap plan # which is the same as tgwrap run plan
# run-all a plan
tgwrap run-all plan
# run-all with excluding a particular directory
tgwrap run-all plan -E 'excluded-dir/*'
# or a directory somewhere further down in the path
tgwrap run-all plan -E '**/excluded-dir/**'
# or do the same in step-by-step mode
tgwrap run-all plan -s
# or excluding (aka ignoring) external dependencies
tgwrap run-all plan -sx
# if you want to add additional arguments it is recommended to use -- as separator (although it *might* work without)
tgwrap output -- -json
Note: special precautions are needed when passing on parameters that contain quotes. For instance, if you want to move state like below, escape the double quote in the staate address:
tgwrap state mv 'azuread_group.this[\"viewers\"]' 'azuread_group.this[\"readers\"]'
A word about escaping inputs
Your shell is escaping special characters such as *
and "
before passing it to the program (tgwrap
in this case). So some inputs need to be escaped in order to function properly.
For example:
# to exclude certain modules from an action (such as analyze):
tgwrap analyze -E 'integrations/\*/\*'
# to import a resource that has a " in its address:
tgwrap import -a 'azuread_group.this[\"my_group\"]' -i ${GROUP_ID}
Deploy manifests
In order to easily deploy a new version of the terraform (and associated terragrunt) modules, we include a manifest file in the root of the landing zone:
---
git_repository: ssh://git@gitlab.com/my-org/my-terraform-modules-repo.git
base_path: terragrunt/my-platform
config_path: terragrunt/config/platform-dev
deploy:
dtap:
applies_to_stages:
- dev
- tst
- acc
- prd
source_stage: dev
source_dir: platform
base_dir: platform
config_dir: ../../../config
configs:
- my-config.hcl
- ../my-ss-config-dir
exclude_modules:
- my-specific-module
include_modules: {}
substacks:
is01:
source: shared-integration/intsvc01
target: integration/is01
exclude_modules:
- my-specific-module
configs:
- my-ss-config.hcl
- ../my-ss-config-dir
is02:
applies_to_stages:
- dev
source: shared-integration/intsvc01
target: integration/is02
global_config_files:
root-terragrunt:
source: ../../terragrunt.hcl
target: ../../terragrunt.hcl
terrasafe-config:
source: ../../terrasafe-config.json
Inspecting deployed infrastructure
Testing infra-as-code is hard, even though test frameworks are becoming more common these days. But the standard test approaches typically work with temporary infrastructures, while it is often also useful to test a deployed infrastructure.
Frameworks like Chef's InSpec aims at solving that, but it is pretty config management heavy (but there are add-ons for aws and azure infra). It has a steep learning curve, we only need a tiny part of it, and also comes with a commercial license.
For what we need ('is infra deployed and are the main role assignments still in place') it was pretty easy to implement in python.
For this, you can now run the inspect
command, which will then inspect real infrastructure and role assignments, and report back whether it meets the expectations (as declared in a config file):
---
location:
code: westeurope
full: West Europe
entra_id_groups:
platform_admins: '{domain}-platform-admins'
cost_admins: '{domain}-cost-admins'
data_admins: '{domain}-data-admins'
just_testing: group-does-not-exist
resources:
- identifier: 'kv-{domain}-euw-{stage}-base'
alternative_ids:
- 'kv-{domain}-euw-{stage}-bs'
- 'kv{domain}euw{stage}bs'
- 'kv{domain}euw{stage}base'
type: key_vault
resource_group: 'rg-{domain}-euw-{stage}-base'
role_assignments:
- platform_admins: Owner
- platform_admins: Key Vault Secrets Officer
- data_admins: Key Vault Secrets Officer
After which you can run the following:
tgwrap inspect -d domain -s sbx -a 886d4e58-a178-4c50-ae65-xxxxxxxxxx -c ./inspect-config.yml
......
Inspection status:
entra_id_group: dps-platform-admins
-> Resource: OK (Resource dps-platform-admins of type entra_id_group OK)
entra_id_group: dps-cost-admins
-> Resource: OK (Resource dps-cost-admins of type entra_id_group OK)
entra_id_group: dps-data-admins
-> Resource: OK (Resource dps-data-admins of type entra_id_group OK)
entra_id_group: group-does-not-exist
-> Resource: NEX (Resource group-does-not-exist of type entra_id_group not found)
key_vault: kv-dps-euw-sbx-base
-> Resource: OK (Resource kv-dps-euw-sbx-base of type key_vault OK)
-> RBAC: NOK (Principal platform_admins has NOT role Owner assigned; )
subscription: 886d4e58-a178-4c50-ae65-xxxxxxxxxx
-> Resource: OK (Resource 886d4e58-a178-4c50-ae65-xxxxxxxxxx of type subscription OK)
-> RBAC: NC (Role assignments not checked)
You can sent the results also to a data collection endpoint (seel also Logging the results).
For that, a custom table should exist with the following structure:
Field | Type |
---|
domain | String |
substack | String |
stage | String |
subscription_id | String |
resource_type | String |
inspect_status_code | String |
inspect_status | String |
inspect_message | String |
rbac_assignment_status_code | String |
rbac_assignment_status | String |
rbac_assignment_message | String |
resource | String |
Generating change logs
tgwrap can generate a change log by running:
tgwrap change-log [--changelog-file ./CHANGELOG.md]
The (optional) change log file will be, if passed along. tgwrap then checks the file for the existance of a start and end markers, in the following format:
start_marker = '<!-- BEGINNING OF OF TGWRAP CHANGELOG SECTION -->'
end_marker = '<!-- END OF TGWRAP CHANGELOG SECTION -->'
If they exist, everything between these lines will be replaced by the new change log.
Development
In order to develop, you need to apply it to your terragrunt projects. For that you can use the --terragrunt-working-dir
option and just run it from the poetry directory. Alternatively you can use the tgwrap-dev script and invoke that from your terragrunt directories. Either put it in your PATH
or create an alias for convenience.