Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

top-github-scraper

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

top-github-scraper

Scrape top GitHub repositories and users based on keyword

  • 0.1.1.6
  • PyPI
  • Socket score

Maintainers
1

Top Github Scraper

Scrape top Github repositories and users based on keywords.

I used this tool to analyze the top 1k machine learning users in this article.

demo

Setup

Installation

pip install top-github-scraper

Add Credentials

To make sure you can scrape many repositories and users, add your GitHub's credentials to .env file.

touch .env

Add your username and token to .env file:

GITHUB_USERNAME=yourusername
GITHUB_TOKEN=yourtoken

Usage

Get Top Github Repositories' URLs

from top_github_scraper import get_top_repo_urls

get_top_repo_urls(keyword="machine learning", stop_page=10)

Output at top_repo_urls_<keyword>_<sort_by>_<start_page>_<end_page>.json:

[
    "/josephmisiti/awesome-machine-learning",
    "/wepe/MachineLearning",
    "/udacity/machine-learning",
    "/Jack-Cherish/Machine-Learning",
    "/ZuzooVn/machine-learning-for-software-engineers",
    "/rasbt/python-machine-learning-book",
    "/lawlite19/MachineLearning_Python",
    "/lazyprogrammer/machine_learning_examples",
    "/trekhleb/homemade-machine-learning",
    "/ujjwalkarn/Machine-Learning-Tutorials"
]

Get Top Github Repositories' Information

from top_github_scraper import get_top_repos

get_top_repos("machine learning", stop_page=10)

Output for 1 repository at top_repo_info_<keyword>_<sort_by>_<start_page>_<end_page>.json :

{
        "stargazers_count": 48620,
        "forks_count": 12155,
        "contributors": {
            "login": [
                "josephmisiti",
                "josephmmisiti",
                "hslatman",
                "0asa",
                "ajkl",
                "ipcenas",
                "cogmission",
                "spekulatius",
                "basickarl",
                "NathanEpstein"
            ],
            "url": [
                "https://api.github.com/users/josephmisiti",
                "https://api.github.com/users/josephmmisiti",
                "https://api.github.com/users/hslatman",
                "https://api.github.com/users/0asa",
                "https://api.github.com/users/ajkl",
                "https://api.github.com/users/ipcenas",
                "https://api.github.com/users/cogmission",
                "https://api.github.com/users/spekulatius",
                "https://api.github.com/users/basickarl",
                "https://api.github.com/users/NathanEpstein"
            ],
            "contributions": [
                671,
                105,
                21,
                12,
                11,
                9,
                8,
                7,
                7,
                7
            ]
        }
    }

Get Top Github Contributors' Profiles

from top_github_scraper import get_top_contributors

get_top_contributors("machine learning", stop_page=10)

Output at top_contributor_info_<keyword>_<sort_by>_<start_page>_<end_page>.csv:

loginurltypenamecompanylocationemailhireablebiopublic_repospublic_gistsfollowersfollowing
0josephmisitihttps://api.github.com/users/josephmisitiUserJoseph MisitiMath & Pencil"Brooklyn, NY"TrueMathematician & Co-founder of Math & Pencil2291422705275
1josephmmisitihttps://api.github.com/users/josephmmisitiUser0020
2hslatmanhttps://api.github.com/users/hslatmanUserHerman SlatmanDistributIT1332046967
30asahttps://api.github.com/users/0asaUserVincent BottaBelgium"Innovation Engineer @evs-broadcast, previously Data Scientist @kensuio, E-Marketing Tools Manager @Diagenode, cofounder @Antibody-Adviser and photographer"35152516
4ajklhttps://api.github.com/users/ajklUserAjinkya Kalekaleajinkya@gmail.com581294
5ipcenashttps://api.github.com/users/ipcenasUser79010
6cogmissionhttps://api.github.com/users/cogmissionUserDavid RayThird planet from the sun...cognitionmission@gmail.comHumanity's freedom and abundance through the pursuit of technological innovation in the area of cognitive applications - Cognition Mission30195444
7spekulatiushttps://api.github.com/users/spekulatiusUserPeter Thaleikis@bringyourownideas127.0.0.1TrueSoftware engineer focused on solutions using open source and simply filling in the gaps to fulfill the requirements.421232920
8basickarlhttps://api.github.com/users/basickarlUserKarl Morrison"Malmö, Sweden"karl@basickarl.ioThe question is: Will you take me seriously51126
9NathanEpsteinhttps://api.github.com/users/NathanEpsteinUserNathan Epstein"New York, NY"nathanepst@gmail.comTrue23122080

Get Top Github Users' Profiles

from top_github_scraper import get_top_users

get_top_users("machine learning", stop_page=10)

Output at top_user_info_<keyword>_<start_page>_<end_page>.csv

loginurltypenamecompanylocationemailhireablebiopublic_repospublic_gistsfollowersfollowing
0rasbthttps://api.github.com/users/rasbtUserSebastian RaschkaUW-Madison"Madison, WI""Machine Learning researcher & open source contributor. Author of ""Python Machine Learning."" Asst. Prof. of Statistics @ UW-Madison."7151388835
1tqchenhttps://api.github.com/users/tqchenUserTianqi Chen"CMU, OctoML"Large scale Machine Learning2818611126
2halfrosthttps://api.github.com/users/halfrostUserhalfrost@AlibabaShanghai Chinai@halfrost.com💪天道酬勤,勤能补拙。博观而约取,厚积而薄发。Gopher / Rustacean / iOS Dev. / Machine Learning / Retired acmer / Math / Philosophy / Technical Writer.2208566314
3ageronhttps://api.github.com/users/ageronUserAurélien GeronParisAuthor of the book Hands-On Machine Learning with Scikit-Learn and TensorFlow. Former PM of YouTube video classification and founder & CTO of a telco operator.431683832
4chiphuyenhttps://api.github.com/users/chiphuyenUserChip Huyenhttps://snorkel.ai"Mountain View, CA"TrueDeveloping tools and best practices for machine learning production.191783915
5rhieverhttps://api.github.com/users/rhieverUserRandy OlsonFOXO BioScience"Vancouver, WA"rso@randalolson.com"Chief Data Scientist, @FOXOBioScience. AI, Machine Learning, and Data Visualization specialist. Community leader for /r/DataIsBeautiful."7717536313
6lexfridmanhttps://api.github.com/users/lexfridmanUserLex FridmanMIT"Cambridge, MA""AI researcher working on autonomous vehicles, human-robot interaction, and machine learning at MIT and beyond."2050310
7eriklindernorenhttps://api.github.com/users/eriklindernorenUserErik Linder-Norén"Stockholm, Sweden"eriklindernoren@gmail.com"ML engineer at Apple. Excited about machine learning, basketball and building things."240376411
8roboticcamhttps://api.github.com/users/roboticcamUserA/Prof Richard Xu 徐亦达教授University of Technology SydneySydney Australia"I am an A/Professor in Machine Learning at UTS. manage a large research team of postdoc, PhD students close to 30 people"10035610
9ogriselhttps://api.github.com/users/ogriselUserOlivier GriselInria"Paris, France"olivier.grisel@ensta.orgMachine Learning Engineer a Inria Saclay (Parietal team).174933237116

Parameters

View a full list of paramters here.

How the Data is Scraped

top-github-scraper scrapes the owners as well as the contributors of the top repositories that pop up in the search when searching for a specific keyword on GitHub.

image

For each user, top-github-scraper scrapes 16 data points:

  • login: username
  • url: URL of the user
  • type: Whether this account is a user or an organization
  • name: Name of the user
  • company: User's company
  • location: User's location
  • email: User's email
  • hireable: Whether the user is hireable
  • bio: Short description of the user
  • public_repos: Number of public repositories the user has (including forked repositories)
  • public_gists: Number of public repositories the user has (including forked gists)
  • followers: Number of followers the user has
  • following: Number of people the user is following

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc