GitHub’s Malicious Repo Explosion & How to Avoid It

GitHub Fork Bomb

This week Apiiro’s security researchers revealed their alarming findings that more than 100,000 repositories on GitHub are infected with malicious code. This is 1000x more compromised GitHub repos than have previously been found.

But that figure shouldn’t be surprising given last year’s sharp increase in software supply chain attacks which resulted in twice as many attacks as the previous three years combined. And when it comes to GitHub, the creating a new vector for a supply chain attack is simple: 

  1. Clone an existing (typically popular) repository
  2. Infect the code with malware
  3. Upload the compromised code back to GitHub using the original repo’s identical name 

And then hackers fork the new repos thousands of times, proliferating them across the site. While GitHub automatically identifies and cleans up so-called “fork bombs” many still escape into the wild, which can only be expected given GitHub hosts more than 100 million developers and over 420 million repositories. The implication is that, at any point in time, there are hundreds to thousands of malicious-but-benign-looking code repositories on GitHub.

So what can you do to avoid becoming a victim of one of these malicious forks?

How to Avoid GitHub Malware

In the latest attack, hackers compromised Python repos with an obfuscated version of a credential stealer (BlackCap-Grabber). Once login passwords, browser cookies and other sensitive information has been collected, it’s sent to a command and control (C&C) server.

Here at ActiveState, we recommend that security-conscious organizations only work with source code and never import prebuilt packages. Of course, pulling source code directly from GitHub (where most Python projects are hosted) means you’d be exposing your organization to just this kind of fork bomb attack. 

However, if you used the ActiveState Platform, you could benefit from a number of features that catch and filter out these kinds of attacks. This is because ActiveState maintains its own local repository of open source code pulled from each language’s ecosystem from which we automatically build packages on demand using a hardened build service. 

On code import, ActiveState:

  • Scans the source code for known malware exploits and suspicious patterns (for example, shellcode embedded as base64 data).
  • Performs source code analysis to surface key threats, such as when a package may call out to external sources (i.e., a Control & Command server).
  • Quarantines source code that doesn’t meet security criteria for further investigation.
  • Performs egress blocking during build steps to ensure packages don’t pull in external resources.
ActiveState Ingestion & Build Pipeline
Figure 1: ActiveState Ingestion & Build Pipeline

The upshot is a defence in depth approach. In this specific case, unpacking the unobfuscated code initially calls out to pull in malicious Python code and subsequently includes a binary executable, as well – all issues that would be flagged by the ActiveState code ingestion pipeline. 

Conclusions: Malware in Source Code vs Binary Packages 

Hackers are moving upstream from poisoning open source repositories like the Python Package Index (PyPI) to poisoning source code repositories like GitHub. The problem for hackers lies in the fact that open source ecosystems enforce unique project names, meaning they need to rely on typosquatting: uploading a compromised version of a project like “NumPy” under the name “NimPy” and hoping someone fat-fingers their install command. Not so with GitHub, which allows multiple repos with the same name, distinguished only at the organization level. This allows hackers to promote their version of a popular project to unsuspecting developers. 

While GitHub provides hackers with the advantage of being able to use the same name as a popular project, hiding malware in Python source code is much harder than in a prebuilt/ binary Python package. Clearly, hackers are hoping that organizations’ existing tools are tuned for scanning prebuilt packages, rather than source code. A fair given that the vast majority of organizations continue to import and work with prebuilt packages despite the risks. 

At ActiveState, we recognize this trend, and ensure that our secure supply chain offering has the controls and processes in place to catch compromised source code before it ends up in your runtime environment. 

Next Steps

Watch our webinar on How to Make Open Source Suck Less by using ActiveState to improve your open source security and dependency management.

Recent Posts

Scroll to Top