Risky Business Podcast: Why Open Source Software Needs Better Malware Tracking
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.
Sarah Gooding
November 20, 2024
Socket founder and CEO Feross Aboukhadijeh was recently a guest on the Risky Business podcast, where he joined cybersecurity journalist Patrick Gray to discuss the threat of malicious packages on open source software registries.
Feross highlighted the alarming frequency of attacks, with over 100 new supply chain threats identified each week across ecosystems like npm, RubyGems, PyPI, and Maven Central. These attacks often exploit developers’ trust through package hijacking or typosquatting, where attackers impersonate legitimate maintainers to insert malicious code. Once published, some of these packages can remain active for years, exposing users to risks like stolen environment variables, unauthorized command execution, and data exfiltration.
The Challenge of Centralized Threat Tracking
One of the central issues Feross addressed is the absence of a standardized repository for cataloging malicious packages. Unlike the National Vulnerability Database (NVD) for CVEs (Common Vulnerabilities and Exposures), no analogous system exists to track malware in software dependencies. Current practices often rely on private vendors like Socket to provide threat intelligence. However, these solutions are limited by the responsiveness of registries, many of which lack the resources to act swiftly.
Feross suggested exploring a more unified approach, proposing that the existing CVE infrastructure could be expanded to include malicious package tracking. This would leverage a system already integrated into many organizations’ compliance processes, ensuring broader adoption and more efficient detection.
Evolving Attacks Exploit Weaknesses in Open Source
The episode also explored the evolving tactics of attackers. From leveraging Ethereum smart contracts for command-and-control operations to targeting crypto wallets built on JavaScript-heavy frameworks like Electron, the sophistication of attacks is growing. Yet, Feross noted that many attacks are surprisingly unsophisticated, often succeeding due to a lack of developer scrutiny.
One notable example discussed was the infamous event-stream attack, which bears many similarities to the recent xz-utils attack. A malicious actor gained access to a popular package by offering to maintain it, only to later insert a backdoor. This incident underscored the systemic vulnerabilities in open source software management and inspired Feross to create Socket.
For more insights from this episode check out the video below:
It is time for this week's sponsor interview now with Feross Aboukhadijeh from Socket. Socket is a software supply chain security company, which can basically flag bad packages that you might be bringing into your projects.
So someone's hijacked a package and put a bunch of malware in there. It'll let you know if a package is trying to do stuff like send, you know, environment information off to some random server in Russia. It'll let you know that if it's trying to download and run executables, it'll let you know that.
Right. So just a good idea in this age where we're constructing software out of so many prebuilt packages, but for us has made a good point. And he's here to make the case that we need to start tracking bad packages the same way that we track bad packages. And that there needs to be some sort of central repository for this information.
So here's Feross Aboukhadijeh to, to make that case. We're detecting about a hundred supply chain attacks per week in NPM Ruby gems Maven, and, and some of the other popular ecosystems. And the big problem is that when we find these threats, Our options are very limited. We, we can obviously protect our customers.
We can give them that data. We can, you know, we have a lot of ways to do that. And we have a lot of folks already using that, but to protect the broader community, our options are contact the registry and let them know that this package is malicious. We get various levels of responsiveness from the different registries.
We, we see that, you know, a lot of these are volunteer run, right? Like PyPI is volunteer run for instance. And so they're, they're under a lot of load. The, the folks maintaining these registries. And so there's usually a pretty long period where these packages remain live before they're taken down, if at all.
I mean, we're tracking some stuff that's been up for years and it's still not taken down and just got lost in the, in the mess. And then the problem is, like, once it is taken down there's no way for a company to figure out whether or not they ever installed that package in the past. You don't get a CVE issued, NVD and, you know, that whole system is, they very rarely issue A CVE for one of these types of findings.
They just consider it out of scope, not to mention the other problems they have around, you know, just the backlog and inability to do what their, their current purposes today. Right? So, so it's, it's just, it's not a good situation, right? So then, then the only way for people to find out whether they might have installed one of these packages is to come to a vendor like us and we can help them look through their artifactory or whatever they might be using internally to mirror packages.
And to see if they're, you know, in some cases still mirroring packages that have already been removed for being malicious in the public registry, but it's still being served to developers inside the company for instance. So, yeah, it's a, you know, it's a huge problem. But do you, you currently publish this information to your website though, right?
Like stuff that you find, not just for customers, but I guess what you're arguing is it shouldn't just be up to a, to a private vendor to, to catalog this stuff. I think so. Yeah. You know, it feels like something. This data feels like it's analogous to what the national vulnerability database does, right?
The NVD is cataloging this, cataloging vulnerabilities, and we need something analogous for malicious packages. If they don't want to do it, someone needs to do it. We put them on our website today for folks to access. And you know, we're not today publishing them in like a consumable format, but folks go and search for a package, you know, they can get all the information that we have.
Yeah. So you're not like publishing standardized data, like some sort of XML feed that people can ingest and then, you know, throw around, I guess that makes sense because that's kind of valuable IP at that point, right? That's right. Especially because of the time delay, right? I mean, we find stuff within a second or two of it being published since we're replicating the feed in real time.
And we have basically every package and every new version of every package. And so that, you know, that is kind of part of the value out of what we, we can do for folks is give them that. You know, coverage while they're while they're waiting for the take down to happen, but we do take it down, right? We do we do want to make sure that the community is protected and they don't have that we're, you know, we're, we're sharing our information with the registries right away when we find stuff.
It's just that they're the ones who are taking time to, to actually get it removed. Yeah, I mean, I think it's probably worth pointing out at this point that NVD is having trouble. Like you alluded to that earlier, they're having trouble even doing their current workload. There's been new contracts issued and whatever.
But I mean, at some point they just stopped enriching vulnerability data earlier this year, right? Like, do you know, have you been tracking that much? I've been following it somewhat. Yeah, I know that at one point there was more than 50 percent of On the Kev list, you know, the known exploited vulnerability list that we're missing that enrichment data.
So all the, all the valuable details and context that that would provide. And so that's just, you know, that backlog and that whole, especially not having stuff, even on the Kev isn't even that, that many vulnerabilities, you know, in the grand scheme of things. So I just think it really undermines the reliability of CVEs is the kind of primary means of assessing software security.
And that's, that's something that's always frankly bothered me is that when throw a package into a vuln scanner and say, Oh yeah, there's no, you know, there's no CVE or, you know, it doesn't match. They think it's safe to run that package. But like, yeah, you know, it's always been bigger. It's always been a bigger problem than that, right?
Yeah. Now just speaking of the problem, you said earlier that you track something like, you know, 100 Malicious packages every week. Are there any places where they're popping up more than others? Are there particular types of malicious packages that are, that are, you know, more likely to do the rounds at the moment?
Like what's a rough breakdown of what that threat environment looks like? Yeah, there, there's a bunch of campaigns that we've posted about recently. There was recently a massive malware campaign that was using Ethereum smart contracts to evade detection. They were using that as kind of the command and control.
There, and it was a huge kind of spam campaign. It went and posted a bunch of packages and squatted a bunch of names. There's there was a recent thing we found too. That's quite interesting. Something we're kind of calling an author typo squatting attack where the attackers were able to impersonate a popular maintainer on npm by you know, faking important metadata in the package that ended up kind of showing up on the, On the official package website.
So it's, you know, there's, there's, it's always evolving. There, there's always like new stuff. A lot of it obviously is going to be like really, you know, just in terms of volume is going to be pretty silly and, and not not the most eye popping things. You get like a lot of just people just stealing all the environment variables as soon as the package is installed.
That type of thing. We, we catch all the time. What, what mystifies us is just how little effort is put into, like, even attempting to obfuscate what, what they're doing, right? I mean, it, it's, it's like they know no one's looking , you know what I mean? Well, I mean, you are the one who's looking right, which is why it's mystifying to you because you could see it.
But as you point out, most people don't look right. So they're not going to see it. Well, you know, you just mentioned, well, most of it's pretty dumb. Most of it's like not eye popping. There's no obfuscation. What's some of the more advanced stuff that you've seen? Can you talk a little about that? Yeah. I mean, we've, we've seen stuff that's just like heavily, heavily obfuscated stuff that targets a single organization through, you know checking different facts about the environment and only activating in those scenarios.
And what sort of organizations are they targeting there? Is that like, I'm guessing crypto exchanges are going to feature pretty heavily there. They often target crypto wallets so that they can get built into one of those wallets. And, you know, a lot of those, you know, tend to be built with electron, right?
So you've got a lot of JavaScript dependencies in, in those wallets. It's like, it's such a target, right? You get into the wallet and you can just wreak havoc and steal, steal the keys. Yeah, it's such a incentive to go after, right. When you have just all that, all that juicy, juicy crypto sitting right there.
Can you think of examples where that's been successful? Yeah, yeah, for sure. There was a, an incident not too long ago in a package called let's see, which, which went, which story do I want to tell? Cause there's actually been multiple of these. There's. I mean, my favorite one, this is the one that actually caused me to start the company, to be honest with you.
Right. So there is a package called events stream. It was got about 6 million weekly downloads, a very popular package. Made by a maintainer who is very prolific. He has published over 500 packages. One of these mega maintainers. Some of the packages not very well maintained as you can imagine one person trying to manage that many projects.
But one of his projects was very widely depended upon by the ecosystem. Was used in, you know, in a lot of almost, you know, all of the dependency trees of, of a lot of node JS users. And so someone approached him and said, Hey, you're not really maintaining this package. There hasn't been an update in two years.
Could I have you know, commit rights to be able to help maintain this because we use it at my company and this maintainer was like, yeah, sure. Of course. Like, whatever. I'm not even using this anymore. I I've already moved on and made a replacement for this library that I like better. So he gave the access to the person and even removed himself and fully was like, I'm I'm done.
You have the package. And that person proceeded to make good. Publishes for about a month and then they took you know, the, the person they had and use that to put you know obfuscated backdoor into the package. You know, if this is sounding familiar, this is something that happened this year with XZ the XZ utils compromise.
It's almost the exact same pattern. This happened back in 2017. So you can see how little we've improved as a security community when it comes to these things. The best part of all is, you know, the way that the so the way that the backdoor triggered is it looked at the context in which it was executing.
And if it was running inside the particular, you know Electron app it would, it would, it would decrypt the code successfully and, and then execute it. Otherwise the decryption would fail. But the way the community caught it was it's, it's going to sound a lot like what happened with XZ. You got a nerdy programmer just kind of looking at things.
So what happened was The Node. js runtime deprecated a function used by the attacker in their attack code. And And it broke? No, it didn't break. Just a warning was printed. But it happened, the deprecation happened a couple days after the backdoor was added. And so, they didn't know that this deprecation was going to happen.
And so, folks that were running the bleeding edge version of the Node runtime, We're getting this deprecation warning and trace it back to this chunk of obfuscated code and we're like, what the heck is this? It looks super out of place, right? Yeah. Yeah. Yeah. Doesn't that sound so similar to XZ? Like, you know, you got this, total accidental, accidental discovery.
You know, it makes you realize like if we keep finding these things accidentally so often, you know, how much more work is there to do as an industry to improve this? Now, I just want to go back to the idea of, you know, an NVD like body doing some tracking here. As much as we do this for CVEs, we've never really done it for malware, right?
We've never really had one central repository for malware signatures, hashes, whatever. So I mean, aren't these supply chain infiltrations a little bit more akin to malware than to CVEs? I mean, I guess it's, it's complicated, isn't it? Because you are talking about a building block. Of software. And that's often what CVEs are used to, you know, you want to track those issues as you're importing stuff into your code.
So it seems like this sort of straddles the line a bit between being, you know, more like a CVE or more like a malware SIG. Yeah. I mean, it's certainly not a vulnerability that we're talking about here. It is different. But the thing about the CVE system is it's actually one area that we've actually done pretty well as an industry.
We've widely deployed CVE scanners and, and, and, you know, CVE scanners are oftentimes they're in the compliance requirements and, you know, things that we have to cover as, as security practitioners. So given that we have this system and given that it's already widely deployed, it might be just, it might be the case that we should just use it for more things, you know, cause everyone's already hooked in in some way to the system.
So that's the argument for using it for more than just vulnerabilities. Yeah. Yeah. I think that's a pretty good one. All right. Feross Aboukhadijeh, thank you so much for joining us this week to talk through all things software supply chain security. A pleasure to chat to you as always. Cool. Yeah. Thanks Pat.
That was Feross Aboukhadijeh there from Socket and you can find them at socket. dev. And that is it for this week's show. I do hope you enjoyed it. I'll be back soon with more risky business for you all. But until then, I've been Patrick Gray. Thanks for listening.
Subscribe to our newsletter
Get notified when we publish new security blog posts!
Try it now
Ready to block malicious and vulnerable dependencies?
A malicious npm package disguised as a WhatsApp client is exploiting authentication flows with a remote kill switch to exfiltrate data and destroy files.