github-stats-pages
Retrieve statistics for a user's repositories and populate the information onto a GitHub static page
Categories | Status |
---|
General | |
CI/CD | |
PyPI | |
Overview
This software is both a GitHub Docker container action and a Python
packaged software. The former allows for this to run to generate GitHub
pages while the latter gives flexibility to deploy on a variety of
compute resources (e.g., cloud, dev).
Some key features of this software:
- Flexible - Designed to be deployed in a number of ways
- Python - Most of the code is Python (excluding static assets) with static types
- Copy-left license: This is supported by open source and thus is open source
with an MIT License!
- Continuous Integration - We currently have 100% code coverage of the Python
codebase and the Docker action
- Environmentally friendly - Websitecarbon.com
reported that a GitHub Pages deployment of this code has a lower carbon
footprint than 90% of web pages tested
Requirements
Traffic data
for repositories are limited to those who have write or ownership access.
Thus, regardless of how you choose to deploy, you will need a token.
This codebase uses
GitHub's Personal Access Token (PAT).
To create one, follow these
instructions
or go here.
For scopes, select: repo
. Save your PAT in a safe place as you will need it later.
Deployment
This code is intended to deploy in a number of ways to allow for the greatest flexibility.
First, this repository is also as a
GitHub Docker container action (see below).
Second, this code is package on PyPI.
Third, the source code can be forked
or cloned.
Finally, a Dockerfile is included for containerization.
GitHub Actions Deployment
TL;DR
For easy deployment, try this
GitHub template. Simply:
- Use it!
- Add a Personal access token, as a repository secret,
GH_TOKEN
.
See above (Settings > Secrets) - If not already enabled, enable GitHub Actions (Settings > Actions)
- Sit back and enjoy that ☕️ !
Note: After the first Action run, you may need to enable GitHub pages through
the settings page and select gh-pages
(Settings > Pages)
The Nitty Gritty
GitHub Pages deployment is simple with the following GitHub Actions cronjob
workflow:
name: Deploy GitHub pages with traffic stats
on:
schedule:
- cron: "0 3 * * *"
jobs:
build-n-publish:
runs-on: ubuntu-latest
env:
BOT_NAME: 'github-actions[bot]'
BOT_EMAIL: '41898282+github-actions[bot]@users.noreply.github.com'
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Get current date
id: date
run: echo "::set-output name=date::$(date +'%Y-%m-%d')"
- name: Build GitHub stats pages
uses: astrochun/github-stats-pages@latest
with:
username: ${{ github.repository_owner }}
token: ${{ secrets.GH_TOKEN }}
- name: Upload data to main branch
uses: EndBug/add-and-commit@v7.0.0
with:
add: 'data'
branch: main
message: "Update data: ${{ steps.date.outputs.date }}"
author_name: env.BOT_NAME
author_email: env.BOT_EMAIL
- name: Upload static files to gh-pages
uses: peaceiris/actions-gh-pages@v3
with:
personal_token: ${{ secrets.GH_TOKEN }}
publish_dir: ./public
keep_files: false
user_name: env.BOT_NAME
user_email: env.BOT_EMAIL
publish_branch: gh-pages
commit_message: "Update static pages: ${{ steps.date.outputs.date }}"
This workflow will run for all public repositories.
Inputs
Variable | Description | Required? | Type | Defaults | Examples |
---|
username | GitHub username or organization | Yes | str | N/A | astrochun |
token | GitHub Personal Access Token (PAT) | Yes | str | N/A | abcdef12345678 |
include-repos | Comma-separated lists of repositories. This overrides the full list of public repositories | No | str | '' | 'github-stats-pages,astrochun.github.io' |
exclude-repos | Comma-separated lists of repositories to exclude from default public repository list | No | str | '' | 'repo1' |
Other GitHub Action deployment examples:
To override all public repositories and limit to a subset of public repositories,
specify a comma-separated list (no spaces between commas) for include-repos
argument.
- name: Build GitHub stats pages
uses: astrochun/github-stats-pages@latest
with:
username: ${{ github.repository_owner }}
token: ${{ secrets.GH_TOKEN }}
include-repos: "github-stats-pages"
Alternatively to exclude specific repositories from the list of public repositories,
use the exclude-repos
argument with a comma-separated list (no spaces between commas).
- name: Build GitHub stats pages
uses: astrochun/github-stats-pages@latest
with:
username: ${{ github.repository_owner }}
token: ${{ secrets.GH_TOKEN }}
exclude-repos: "repo1,repo2"
Note that you can only specify include-repos
or exclude-repos
.
Specifying both will fail!
Docker Deployment
This repository includes a Dockerfile.
More details/instructions provided later.
From source
To run this code from original source, you will need to install it.
Installation
Use our PyPI package to
get the most stable release:
(venv) $ pip install github-stats-pages
Or if you want the latest version then:
(venv) $ git clone https://github.com/astrochun/github-stats-pages
(venv) $ cd github-stats_pages
(venv) $ python setup.py install
Execution from source
TL;DR: If you decide to run this code from source, there are a few things you should know.
First, this repository includes an entrypoint.sh
.
You can simply execute it with the following:
(venv) laptop:github_data $ username="<username>"
(venv) laptop:github_data $ token="<personal_access_token>"
(venv) laptop:github_data $ /path/to/github-stats-pages/entrypoint.sh $username $token
Second, it is recommended to create a folder (e.g., github_data
) as the contents
will ultimately contain multiple files.
More details
Here's an overview providing more details how this codebase works.
There are four primary scripts accompanying github-stats-pages
get_repo_list
gts_run_all_repos
make_stats_plots
merge_csv
get_repo_list
generates a CSV file containing a list of public repositories
for a GitHub user/organization. This database allows the code to aggregate
statistics for all repositories. To run, simply use the following command:
(venv) laptop:github_data $ get_repo_list -u <username/organization>
This will generate a CSV file called "<username/organization>.csv".
It is recommended to create a folder (e.g., github_data
) as the contents
will ultimately contain multiple files.
Next, let's gather the statistics for all public repositories that are not
forks. We use another Python library that does this called
github-traffic-stats. It
is accompanied by a python
script called gts
.
To access traffic data, this requires a PAT. See above
for instructions. Then you can execute the next script:
(venv) laptop:github_data $ token='abcdef12345678'
(venv) laptop:github_data $ gts_run_all_repos -u <username/organization> -t $token -c <username/organization>.csv
This will generate CSV files with date and time stamps prefixes for clones,
traffic, and referrals. With routine running of this code, you will
generate additional CSV files that allow for you to extend beyond a two-week
window of data aggregation. The data can be merged with the merge-csv.sh
script:
(venv) laptop:github_data $ merge_csv
This generates four files: merged_clone.csv, merged_paths.csv, merged_referrers.csv,
and merge_traffic.csv. These files are used in the final step to generate the
plots.
Finally to generate static pages containing the visualization, we
use the make_stats_plots
script:
(venv) laptop:github_data $ make_stats_plots -u <username> -c <username>.csv -t $token
This will generate all contents in the local path. Note that you can specify
an output directory with the -o
/--out-dir
option. Default is the current
path.
The resulting folder structure, for example, will be the following:
github_data/
├── data
│ ├── 2021-01-17-00h-46m-clone-stats.csv
│ ├── 2021-01-17-00h-46m-referrer-stats.csv
│ ├── 2021-01-17-00h-46m-traffic-stats.csv
│ ├── ...
│ ├── merged_clone.csv
│ ├── merged_paths.csv
│ ├── merged_referrer.csv
│ └── merged_traffic.csv
├── repos
│ ├── github-stats-pages.html
│ └── ...
├── styles
| ├── css
| │ └── style.css
| └── js
| ├── bootstrap.min.js
| ├── jquery.min.js
| ├── main.js
| └── popper.js
├── about.html
├── index.html
├── repositories.html
└── <username>.csv
FAQ
1. How do I add old data?
If you ran this code outside your production deployment (e.g., GitHub pages),
it is in fact straightforward to include those data.
For GitHub Pages deployment, simply:
git clone
your copy of the github-stats
repo- Move/copy previous CSV files to the
data
folder in the main
branch.
These files follow a YYYY-MM-DD prefix - Then add, commit, and push:
git add data/????-??-??*stats.csv
, git commit -m "Add old data"
, git push
On the next GitHub Action scheduled run, the live pages will automatically incorporate these data.
For any other deployments (e.g., cloud), simply:
- Move/copy/
rsync
/scp
the previous CSV files to the data
folder in the deployed instance
Upon the next cronjob
or script run, the old data will automatically be incorporated.
2. How do I add content to the home page (index.html
)?
The deployed index.html
can be customized to provide a biography, cool
graphics, and/or additional statistics. This is possible through a
GitHub profile README
that you can create. The link above provides instruction for setting up one. This software
will convert the markdown content to HTML and include it in the index.html
.
An example of the outcome can be found here.
Many GitHub users have developed fancy GitHub profile READMEs:
https://github.com/abhisheknaiidu/awesome-github-profile-readme.
By including those in your profile README, they should be included in your deployed version.
If it doesn't work, feel free to reach out.
Note: While a GitHub profile README does not work for an organization in the same manner as
individual GitHub accounts, this software will still use its content if it is publicly available.
Here's an example
3. What happens when I renamed a repository?
This software will retrieve the latest list of public repositories. When the
statistics pages are then generated, it searches the data/
folder for the
information for each repo. As such, there is an issue with renaming of
repositories. This will be apparent in the logs with the following warnings:
WARNING: Possible issue with repository name, ...
If you renamed it, you will need to update data/ contents
To rectify this issue, you can git clone
your GitHub repository, and rename
each occurrence of the old repositories with the new ones using your preferred
IDE or command-line options (e.g., sed
). Then git add
, git commit
, and
git push
these changes. The next scheduled run will then work as intended.
Versioning
Continuous Integration
Authors
See also the list of
contributors who participated in this project.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Used by
A list of repos using github-stats-pages
can be found here.