The number of GitHub stars is probably the first metric we look at when we evaluate an open-source project or package. Our habit puts a lot of weight on GitHub stars as a metric in software supply chain security. However, such weight corrupts GitHub stars as a popularity metric, echoing the two laws argued by social scientists:
- “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (Campbell’s law)
- “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” (Goodhart’s law)
Recently, people are buying fake GitHub stars either to cheat the popularity contest, or to spread malicious content. Their prices can be as low as $0.10 per star. However, GitHub Acceptable Use Policy prohibits “automated excessive bulk activity and coordinated inauthentic activity" and “inauthentic interactions, such as fake accounts and automated inauthentic activity”.
Although GitHub “has been aware of the presence of fake starrers for years, and actively works to remove these from the platform,” we still do not understand this phenomenon well enough, and we do not have publicly available estimates of their frequency and impact. Dagster proposed an open-source detector last year, which is a good starting point, but our experiments found that it is not scalable enough to practically scan the entire GitHub. This motivates us to start an new exploration around fake stars in GitHub.
Why Fake Stars Matter: The Real Risks Behind Inflated GitHub Stars#
Fake stars are used for scamming, defrauding, and even spreading malicious content.
First, fake stars trick people into installing malicious software.
zigzagmoot/TapSwapAuto
zigzagklatton/APEXLEGENDSbyklatton
zigzag869/utrorrent-activation-by-gaij
zhengyanlin18/PlayDoge-Auto-Farm-and-Bot-Setup
zhengkaifor/adobe-lightroom-ai-activation
zhaowuling/Zhaowulling-Sea-Main
zhangdapao9523/Flash-USDT-Sender
zhangdapao9523/ETH-HUNTER
zhangdapao9523/DoxCoinAuto
1cyres/Albion-Radar-Main
1Xitz1/eg54yyg5e4
1Xitz1/d5y4ggy5d4
1905mali/League-Of-Legends-Hack
1842JakUCY/h7ixgmze47ykfk4
zhangdapao9523/DotcoinAuto
...
This is a snapshot of GitHub repositories with a very high percentage of fake stars (all of the ones in this list have been taken down by GitHub). We could probably guess from the repository names that many of them may be spreading malware to steal your cryptocurrencies or copyright violation / piracy software (which may also contain hidden malware).
Here is another example malicious repository Solmonster/PhantomSniper-Solana-Sniper-Bot
we have found that is still on GitHub at the time of writing (mid Aug). It has 109 suspected fake stars at the time of detection (early July) and a fancy README. However, it is secretly stealing your cryptocurrencies using a hidden spawn()
call.
Fake stars trick VCs into spending money on fake companies with bad products that have low traction.
The motivation for luring VCs with fake stars is growth hacking (”fake it until you make it”). However, our early statistical modeling shows that cheating fake stars does not really help you gain traction. It may be able to get you more real stars in the first month, but the presence of fake stars gives you a negative effect long term. Plus, it does not have a statistically significant effect on attracting downloads.
Finally, fake stars promote low quality GitHub repositories, notably low quality “listicles” and tutorials, creating spam and information pollution on GitHub. For example, we have detected a large bunch of fake star repos with “awesome”, “template”, “demo”, “example,” etc. in their titles. These seemingly popular but low quality listicles/tutorials adds more noise to GitHub and may be misleading to programming newcomers.
Recognito-Vision/Face-SDK-iOS-Demo 93 stars (93 suspected fakes)
dsnbey/MVVM-Layered-Architecture-Example 237 stars (236 suspected fakes)
1321928757/Concurrent-MulThread-Demo 81 stars (80 suspected fakes)
dnbmagic/farcaster-examples 68 stars (67 suspected fakes)
dnbmagic/awesome-frames 64 stars (63 suspected fakes)
solidglue/tensorflow2_examples_jupyter 61 stars (60 suspected fakes)
andeug/code-examples 490 stars (478 suspected fakes)
Recognito-Vision/Face-SDK-Android-Demo 98 stars (95 suspected fakes)
StrawHat1Luffy/farcaster-examples 69 stars (64 suspected fakes)
solidglue/sklearn_examples_jupyter 69 stars (61 suspected fakes)
scayle/demo-add-on-vite 70 stars (61 suspected fakes)
Foblex/f-flow-example 77 stars (67 suspected fakes)
Recognito-Vision/Face-SDK-Linux-Demos 109 stars (89 suspected fakes)
52jing/wang-template-backend 126 stars (100 suspected fakes)
ai-boost/awesome-prompts 3892 stars (3015 suspected fakes)
CerberusChaos/Starknet-Dapp-Template 85 stars (59 suspected fakes)
jiawanlong/cesium-three-demos 96 stars (64 suspected fakes)
...
Fake star campaigns are rapidly growing.
Our algorithm identified 3,746,538 suspected fake stars in the last five years (July 2019 to July 2024) and 10,155 repositories that have seemingly run a fake star campaign. The number of suspected fake stars is rapidly growing in the last six months.
GitHub is actively addressing fake stars but the risks remain.
According to our estimate, ~89% of repositories with suspected fake star campaigns have been deleted. It is unclear whether GitHub is taking action on these repositories because they bought fake stars, or because they are spreading malware, or because the authors deleted them.
However, there are still ~11% (1,136) repositories present on GitHub even if they have a suspected fake star campaign. Notably, for the 41 npm and 47 PyPI packages, only three (3.4%) of their GitHub repos have been deleted in GitHub.
More importantly, we found that VirusTotal is reporting malware on 28 repositories that are still on GitHub at the time of writing, indicating that fake stars are highly correlated with malicious activities on GitHub.
Even for the repositories that are eventually taken down (which means that they are probably malicious!), 7.86% of them have lived for more than one month, leaving a long time window for these repositories to make potential exploitations.
Only 18.12% of the users participating in those suspected fake star campaigns have been deleted on GitHub. Most of the deletions happened recently.
How Fake Stars Were Discovered#
This project is led by one of our summer interns, Hao He. Hao is currently a Ph.D. student in Software Engineering at Carnegie Mellon University, co-advised by Dr. Bogdan Vasilescu and Dr. Christian Kästner. The initial idea comes from the observation that the number of stars are often used blindly by both researchers and practitioners without careful consideration about the meaning behind it. A bit more further exploration shows that there are multiple GitHub star black markets and these fake stars may be linked with other malicious activities in the software supply chain. This aligned with Socket’s interest and resulted in Hao’s internship project of detecting fake star activities in GitHub.
To find fake stars, Hao builds on prior research from social media fraud detection and open-source software. The detector runs on the GHArchive dataset, a mirror of all GitHub events stored in Google BigQuery and updated daily. It employs two heuristics:
- Low Activity Heuristic: Some fake star merchants use scripts to massively register one-time, thrown-away accounts to deliver fake stars. Thus, inspired by the Dagster’s detector, we designed a similar heuristic to find users that are registered to star a repository and then become inactive at the same day.
- Clustering Heuristic: Fake star merchants usually assure their client that the stars they bought will be delivered in a very short time, and some of them are reusing their accounts at hand to star many repositories. This business model leaves such “cluster-alike” patterns that are extremely hard to hide and extremely rare among real users. Mathematically, this corresponds to a heuristic that finds clusters of N users and M repositories, in which each of the repositories received stars from at least P% of the N users in a short time period ∆t. This heuristic is used by Facebook to detect fake likes. It is equivalent to the maximal biclique enumeration problem which is NP-Complete. Their algorithm, CopyCatch, is designed to find local optimas on an extremely large-scale dataset in a distributed system. Their original implementation is using the MapReduce framework and not open-source, but we replicated the same algorithm on the GHArchive dataset stored in Google BigQuery. We are able to run the detector on the last five years of GitHub events, totaling a formidable amount of ~20TiB of data.
Note that both heuristics generate false positives. Notably, fake star accounts may star legitimate repos to avoid detection. To make our output more trustworthy, we included an additional post-processing step to only label repositories with a noticeable fake star bursts as those that probably bought fake stars.
- An example repository that probably bought fake stars
- Example repositories that are probably a victim of fake star campaigns (their shared spikes are possibly due to a large swarm of fake star accounts starring all of them around the same time, note the sudden bump of fake stars since 2024)
Introducing a New “Suspicious Stars on GitHub” Alert#
Based on this research, Socket is launching a new “Suspicious Stars on GitHub” alert that utilizes the low activity and clustering heuristics to detect packages associated with repositories that have fake stars.
This alert gives users more visibility into the legitimacy of a software package’s star count, and flags those that may have been artificially inflated stars from bots, crowdsourcing, or other means. It’s set as a High Severity alert, due to the potential for spam, fraud, or even a supply chain attack. These packages should be carefully reviewed before installing.
What You Can Do About Suspicious Stars#
First, you should look carefully at the open-source packages or projects you want to use. Don’t take stars at face value! If the star count seems fishy (e.g., the projects have lots of stars but very little actual activity, such as open issues and PRs), it could be fraudulent.
Second, if you are suspicious about certain packages, you can check Socket’s package pages for free: We publish all this data and make it available on our website, so anyone can check view package information with our detections.
Finally, if you want to get proactive alerts and check your entire organization for suspicious star packages (and 70+ indicators of supply chain risk), install the free Socket for GitHub app in just 2 clicks. Whenever a new dependency is added or updated in a pull request, Socket analyzes the package's behavior and security risk, alerting you before any malicious code has the chance to land in your project.