Shepherd
Shepherd is a utility for applying code changes across many repositories.
- Powerful: You can write migration scripts using your favorite Unix commands, tools like
jscodeshift
, or scripts in your preferred programming language. - Easy: With just a few commands, you can checkout dozens of repositories, apply changes, commit those changes, and open pull requests with detailed messages.
- Flexible: Ships with support for Git/GitHub, but can easily be extended to work with other version control products like Bitbucket, GitLab, or SVN.
For more high level context, this blog post covers the basics.
Getting started
Install the Shepherd CLI:
npm install -g @nerdwallet/shepherd
If using GitHub Enterprise, ensure the following environment variables are exported:
export SHEPHERD_GITHUB_ENTERPRISE_BASE_URL={company_github_enterprise_base_url} # e.g., github.com
export SHEPHERD_GITHUB_ENTERPRISE_URL={company_github_enterprise_url} # e.g., api.github.com/api/v3
If using ssh, ensure that your GITHUB_TOKEN is exported:
export GITHUB_TOKEN=<PAT>
Shepherd will now be available as the shepherd
command in your shell:
shepherd --help
Usage: shepherd [options] [command]
...
Take a look at the tutorial for a detailed walkthrough of what Shepherd does and how it works, or read on for a higher-level and more brief look!
Go to tutorial →
Motivation for using Shepherd
Moving away from monorepos and monolithic applications has generally been a good thing for developers because it allows them to move quickly and independently from each other. However, it's easy to run into problems, especially if your code relies on shared libraries. Specifically, making a change to shared code and then trying to roll that shared code out to all consumers of that code becomes difficult:
- The person updating that library must communicate the change to consumers of the library
- The consumer must understand the change and how they have to update their own code
- The consumer must make the necessary changes in their own code
- The consumer must test, merge, and deploy those changes
Shepherd aims to help shift responsibility for the first three steps to the person actually making the change to the library. Since they have the best understanding of their change, they can write a code migration to automate that change and then user Shepherd to automate the process of applying that change to all relevant repos. Then the owners of the affected repos (who have the best understanding of their own code) can review and merge the changes. This process is especially efficient for teams who rely on continuous integration: automated tests can help repository owners have confidence that the code changes are working as expected.
Migration Configuration Schema
Example
A migration is declaratively specified with a shepherd.yml
file called a spec. Here's an example of a migration spec that renames .eslintrc
to .eslintrc.json
in all NerdWallet repositories that have been modified in 2018:
id: 2018.07.16-eslintrc-json
title: Rename all .eslintrc files to .eslintrc.json
adapter:
type: github
search_type: code
search_query: org:NerdWallet path:/ filename:.eslintrc
hooks:
should_migrate:
- ls .eslintrc
- git log -1 --format=%cd | grep 2018 --silent
post_checkout: npm install
apply: mv .eslintrc .eslintrc.json
pr_message: echo 'Hey! This PR renames `.eslintrc` to `.eslintrc.json`'
Fields
-
id
:
- Description: Specifies a unique identifier for this migration.
- Usage: Used as a branch name for the migration and internally by Shepherd to track migration state.
-
title
:
- Description: A title for the migration.
- Usage: Used as the commit message.
-
adapter
:
- Description: Specifies the version control adapter for repo operations.
- Details: Currently supports only GitHub, but can be extended for Bitbucket or GitLab. Configuration differs based on the adapter.
- GitHub Specific:
search_query
: Utilizes GitHub's code search qualifiers to identify candidate repositories.org
: Specify YOURGITHUBORGANIZATION
to consider every visible repo in the organization.search_type
(optional): Either 'code' or 'repositories'. Defaults to code search. For repositories, use Github repository search.
Hooks
Hooks define the core functionality of a migration in Shepherd.
-
should_migrate
:
- Description: Commands to determine if a repo requires migration.
- Behavior: Non-zero exit values indicate the repo should not be migrated.
-
post_checkout
:
- Description: Commands executed after a repo passes
should_migrate
checks. - Usage: Ideal for one-time setup actions per repo, like installing dependencies.
-
apply
:
- Description: Commands that perform the actual migration.
- Note: This can range from simple to complex sequences, depending on migration needs.
-
pr_message
:
- Description: Commands to generate a pull request message.
- Output: Anything written to
stdout
is used for the message. Multiple commands will have their outputs concatenated.
Requirements
- Optional:
should_migrate
, post_checkout
- Required:
apply
, pr_message
Environment Variables
Shepherd exposes some context to each command via specific environment variables. Some additional enviornment variables are exposed when using the git
or github
adapters.
Environment Variable | Default | Description |
---|
SHEPHERD_REPO_DIR | ~/.shepherd | the absolute path to the repository being operated on. This will be the working directory when commands are executed. |
SHEPHERD_DATA_DIR | ~/.shepherd | the absolute path to a special directory that can be used to persist state between steps. This would be useful if, for instance, a jscodeshift codemod in your apply hook generates a list of files that need human attention and you want to use that list in your pr_message hook. |
SHEPHERD_BASE_BRANCH | default branch | the name of the branch Shepherd will set up a pull-request against. This will often, but not always, be main. Only available for apply and later steps. |
SHEPHERD_MIGRATION_DIR | path to migration spec | the absolute path to the directory containing your migration's shepherd.yml file. This is useful if you want to include a script with your migration spec and need to reference that command in a hook. For instance, if you have a script pr.sh that will generate a PR message: my pr_message hook might look something like this: pr_message: $SHEPHERD_MIGRATION_DIR/pr.sh |
SHEPHERD_GIT_REVISION | | (git and github adapters) is the current revision of the repository being operated on. |
SHEPHERD_GITHUB_REPO_OWNER | | (github adapter) is the owner of the repository being operated on. For example, if operating on the repository https://github.com/NerdWalletOSS/shepherd , this would be NerdWalletOSS . |
SHEPHERD_GITHUB_REPO_NAME | | (github adapter) is the name of the repository being operated on. For example, if operating on the repository https://github.com/NerdWalletOSS/shepherd , this would be shepherd . |
SHEPHERD_GITHUB_ENTERPRISE_URL | api.github.com | For GitHub Enterprise, export this variable containing the company's GitHub Enterprise url (e.g., api.github.com/api/v3 ). |
SHEPHERD_GITHUB_ENTERPRISE_BASE_URL | api.github.com | For GitHub Enterprise, export this variable contraining the company's GitHub Enterprise base url. |
SHEPHERD_GITHUB_PROTOCOL | ssh | (github adapter) is the protocol to use when cloning repos. Can be https or ssh. https is useful when ssh is firewalled. |
Additional Context:
When working with GitHub, the API endpoints differ based on whether you are using GitHub's public cloud service, GitHub Enterprise Cloud, or GitHub Enterprise Server (on-premises). Here’s how the endpoints vary:
In each case, the API functionality and how you interact with it are largely the same, but the base URL changes based on where your GitHub instance is hosted.
As a result, the value of SHEPHERD_GITHUB_ENTERPRISE_URL
is a function of the type of GitHub service and used to support git APIs except cloning which is configurable via SHEPHERD_GITHUB_ENTERPRISE_BASE_URL
. For SHEPHERD_GITHUB_ENTERPRISE_BASE_URL
, while github.com
works across GitHub service types, for backwards compatibility, we default to api.github.com
.
Usage
Shepherd is run as follows:
shepherd <command> <migration> [options]
<migration>
is the path to your migration directory containing a shepherd.yml
file.
There are a number of commands that must be run to execute a migration:
checkout
: Determines which repositories are candidates for migration and clones or updates the repositories on your machine. Clones are "shallow", containing no git history. Uses should_migrate
to decide if a repository should be kept after it's checked out.apply
: Performs the migration using the apply
hook discussed above.commit
: Makes a commit with any changes that were made during the apply
step, including adding newly-created files. The migration's title
will be prepended with [shepherd]
and used as the commit message.push
: Pushes all commits to their respective repositories.pr-preview
: Prints the commit message that would be used for each repository without actually creating a PR; uses the pr_message
hook.pr
: Creates a PR for each repo with the message generated from the pr_message
hook.version
: Prints Shepherd version
By default, checkout
will use the adapter to figure out which repositories to check out, and the remaining commands will operate on all checked-out repos. To only checkout a specific repo or to operate on only a subset of the checked-out repos, you can use the --repos
flag, which specifies a comma-separated list of repos:
shepherd checkout path/to/migration --repos facebook/react,google/protobuf
Run shepherd --help
to see all available commands and descriptions for each one.
Developing
Run npm install
to install dependencies.
Shepherd is written in TypeScript, which requires compilation to JavaScript. When developing Shepherd, it's recommended to run npm run build:watch
in a separate terminal. This will incrementally compile the source code as you edit it. You can then invoke the Shepherd CLI by referencing the absolute path to the compiled cli.js
file:
cd ../my-other-project
../shepherd/lib/cli.js checkout path/to/migration
Shepherd currently has minimal test coverage, but we're aiming to improve that with each new PR. Tests are written with Jest and should be named in a *.test.ts
alongside the file under test. To run the test suite, run npm run test
.
We use ESLint to ensure a consistent coding style and to help prevent certain classes of problems. Run npm run lint
to run the linter, and npm run fix-lint
to automatically fix applicable problems.
Credits
- Logo designed by Christopher Wharton.