Risky Biz Podcast: How Socket Goes Beyond Vulnerabilities to Tackle Modern Supply Chain Attacks in Open Source Software
In the latest Risky Biz Podcast episode, Socket CEO Feross Aboukhadijeh discussed the limitations of the National Vulnerability Database (NVD) in addressing the modern risks associated with using open source package registries.
Sarah Gooding
July 22, 2024
In the latest episode of the Risky Business podcast, host Tom Uren chatted with Socket CEO Feross Aboukhadijeh about the limitations of the National Vulnerability Database (NVD) in addressing the modern risks associated with open-source software. While the NVD is effective for tracking vulnerabilities, it often fails to account for backdoors, malware, and other malicious code found in open-source packages. This gap in coverage leaves organizations vulnerable, as many of these threats are not officially documented and, therefore, not detected by traditional vulnerability scanners.
Feross emphasized the diverse range of threats posed by malicious packages, from political protest "protestware" to sophisticated state-sponsored backdoors. Socket addresses these issues by continuously analyzing and monitoring all major open-source ecosystems in real-time. Using advanced static analysis and machine learning, Socket can detect malicious behaviors, such as data exfiltration and obfuscated code, that might not be flagged by conventional tools. This proactive approach helps identify approximately 100 supply chain attacks each week, significantly enhancing security for organizations that rely on open-source software.
They also discussed the challenges of maintaining internal package mirrors, which can inadvertently harbor and distribute malicious packages even after they have been removed from public registries. Feross explained how Socket provides solutions to mitigate these risks by integrating with internal package hosts and offering real-time alerts and remediation guidance.
Check out the full episode in the video or read the transcript below.
Tom: Hello, everyone. Welcome to another Risky Business News sponsor interview. Today, I have with me Feross Aboukhadijeh from Socket. G'day Feross, how are you?
Feross: Doing good, Tom. Glad to be here.
Tom: Socket makes a system for inspecting security of the open source packages that you might be using or might be interested in using.
And in the wake of some of the problems with the national vulnerability database in recent months, I thought it'd be interesting to talk about how in, I guess, air quotes, traditional software packages, there's this whole system of tracking vulnerabilities through CVEs. That information gets enriched, vendors pick up that information and use what version, how important is it, and use that to try and keep a handle on the whole mess that is the world of the internet and software.
But my understanding is that none of those mechanisms really exist in open source package registries. Is that right?
Feross: So I would say that we care about more than vulnerabilities when we talk about, like, is our open source software and our open source components, are they secure? We, we care about whether the, the packages have backdoors, whether they're malicious, whether there's unwanted code, you know, we've seen all kinds of different threats over the last A few months and years, everything from political protest what people have dubbed protest ware to malicious install scripts.
We've seen cryptocurrency miners. We've seen ransomware. We've seen XZ utils, which was a very sophisticated state sponsored backdoor in an open source project. So there's been all kinds of different threats and, and unfortunately, not all of those make it into the NVD. And therefore they don't all make it into the different vulnerability scanning software that most companies are using.
And so while the NVD is still useful and, and while, you know, vulnerability scanning tools are still useful. They're just not equal to the challenge of managing the risk from software supply chain attacks. So I think we're seeing is rather than focusing on just remediating vulnerabilities to really manage the, the modern risk of software supply chain attacks that come from open source.
We actually have to shift our focus to malware and all the different modern attack techniques that attackers are using in open source package registries.
Tom: Right, so are you saying that if something like a backdoor would not get a CVE number?
Feross: So we, we see sometimes those do get CVE numbers, but it's not consistent.
It's not repeatable. In fact, the vast majority of the time, I would estimate 90 to 95 percent of the time, we don't see a CVE number being issued for malicious packages. And I think it's, it's shockingly high.
Tom: Do you know why that is?
Feross: I'd love to get like an actual definitive reason from, from the, the folks that, that run it.
But my, my understanding is that the NVD, it's right there in the name, right? National Vulnerability Database. And it's, so it's really focused on vulnerabilities, accidental security bugs in software. It's not really meant for malware and for malicious code. It's not meant for viruses. You know, it's just that it's, it's not, that's not what they track.
And so The risk that creates for companies and security teams, it is when you have a package that is affected by a backdoor that has malware in it, there isn't really a central repository where you can go to find out if you're using such a package. And we'll see open source registries like NPM and PyPI and Maven Central just remove packages and they don't.
There's no CVE published. There's no way to find out, you know, whether you're using one of those packages or whether you might have already replicated a package that's malicious into your own internal mirror, you know, your own artifactory instance or whatever you might be using to host packages internally.
Tom: right. So they just remove it. You've got no idea even why they removed it. Do they remove them for other reasons than badness? So it could be that there's a benign reason that it's been removed and you've got no idea whether it's, you're fine, there's just some minor problem or you're in deep doo doo.
Feross: Yeah, that's exactly right. So when packages are removed it can be for a number of reasons. We see obviously malicious code is a big reason, but there's also accidental secrets getting published inside packages. That'll be a reason. There's a ton of copyright infringement that goes on these registries.
People literally host movies, like full length movies on NPM. So that's a, that's a big source of removals. It's, you know, it's literally it's a free file host, right? You can, I mean, you can run one command and they'll just host, you know, your packages at multi gigabyte file size and it's on a CDN, so you're obviously going to see a ton of abuse there.
We see SEO spam. So there's lots of different reasons. So it's not necessarily the case that something is removed and it's, it's malware, but usually if it's removed, you Probably don't want to use it. And it's something you want to know about.
Tom: Right, right. So once it's removed, there's this uncertainty that you can't eliminate currently.
And so the best practice would be to get rid of it. Like why carry risk, I guess, is what you're saying.
Feross: Yeah. Although I'll say the flip side is right. Like a lot of the reason why companies run internal package mirrors is because they want to build robustness. You know, they don't want their builds and their, their software release processes to be dependent upon the uptime of PyPI, which is a volunteer run project.
And so that's why they do that replication. And so the whole point of those mirrors is that they want to keep packages around if they're ever used in the company, even after they're potentially removed from the public registry. And so you have this, this, Inherent conflict with the whole purpose of these mirrors is to keep your build and your processes running and to have a record of those, those packages.
You also don't want to continue hosting a malicious package in your mirror, and then providing that to your developers when it's already been removed publicly. And so the public is actually safer than your, you know, your team is because you continue to host the malware. You know, that's, that's where Socket can help.
I mean, we have lists of these things and you can connect it up to your internal package host, whether it's artifactory or whatever you use. And, and we can help you know, see if you're, if you're, if you're using any of that type of stuff.
Tom: But how do you help? I mean, if you're not getting a sort of official vouched for, you know, here's a piece of bad, malicious stuff through a database or through a number, is, are you just, I don't know, how are you doing that?
Feross: Yeah. So, so that's a great question. Socket. We clone every single open source. A storage that exists in all the major ecosystems. So NPM, PyPI, Go, Maven Central, Ruby Gems, et cetera. And we follow that, we replicate those packages in Realtime. So that means whenever there's a new package published in any of those ecosystems, within seconds we already have a copy of it.
And then we do a full source code analysis of those packages. Starting with the static analysis, we look at the maintainer behavior and history. So like, what's their track record? Is this a new author? Have they ever published code before? We look at the dependency graphs and the trustworthiness of that graph.
And then we also look at the source code itself using an LLM. And that actually is pretty great at identifying threats that we don't have static rules for. So we'll often catch data exfiltration. We'll catch obfuscated code, hidden behavior that isn't meant to be inside those packages. And then we put all that in into a feed that we have a human team of security researchers and experts that review that and produce a clean, a clean data signal for our customers.
With that whole system, we're actually able to identify right now around 100 supply chain attacks every week across the ecosystem that we monitor. And since we started the company, that's already added up now to like, I think we're crossing the 16, 000 malicious package threshold. So there's a to I mean, this stuff is being published constantly to the registries.
I think that the numbers boggle the mind. I don't think there's a general awareness of just how much Frankly, garbage toad there is on these registries. I mean, it's, it's straight up a public wiki where anyone can post stuff, right? It's basically what it is.
Tom: So presumably most of these packages affect a small number of people.
Is that right? Because like I do hear of occasionally when there's large ones, but if it's that many, there's got to be a lot of. I guess diffuse harm.
Feross: That's correct. Yeah. So, I mean, if you just take the sheer numbers, like most of them have probably under 50 downloads, though you're, you're absolutely correct.
That doesn't mean that they're not a potential threat, even the, for the ones that aren't downloaded because they, they'll often typo squat, meaning they're registering names that are one letter off from another package. And so, If you ever make a mistake, they're just sitting there waiting for you to, you know, typo your, your, your package installation.
And so one of the cool things that we have at Socket is we actually have we call it SafeNPM or SafePyPI, which is basically a wrapper that wraps the CLI that the developer uses. And when they go to type NPM install if they do typo that installation, we can actually step in and just like you would get on a Google search, we can say, Did you really mean to install?
This package that has 50 downloads when there's a one that's one letter different that actually has 20 million downloads, you know, and we can kind of save them from themselves there,
Tom: right? So you're sort of intercepting the typo squatting, your typo squatting on typo squatting.
Feross: Exactly. Yeah, exactly.
We're, we're sort of, yeah, we're watching the install and then we do that for any, we can also do that for any of the other threats that we detect. So, you know, obviously if it's a malicious package or if it, if it violates company policy in one way or another, whether it's. Anything from license restrictions to, hey, this package actually collects telemetry off the system.
We see a lot of packages now collecting telemetry for maintainer purposes, like, you know, they'll just collect the IP of who installed their code, the host name of the system. It's not necessarily malicious, but they're just pulling pieces of system information, so they can identify who's using their code.
Pretty much none of our customers want that in their codebases. It's just, it's just risk, and it's And so being able to call that out and then especially if there's instructions that they provide on how to disable that functionality, we can, we can provide that too during that, during that installation process and help the developer.
Tom: Yeah. Yeah. Right. So when we started this interview, I thought you were going to argue that package registries should have some sort of You know, reporting mechanism for vulnerable or malicious packages. But now that I've heard what you say Socket does, surely you'd argue they should just stay the same and you should all just use Socket.
Feross: Well, I think, I think that we're, we're actually happy to collaborate with registries. And if anyone who works at NPM which, you know, is now owned by GitHub or, you know, the volunteers that run PyPI and, and, you know, the other registries want to work with us, I would be happy to work with them and provide our data feeds to them.
We do actually report everything that we identify to the registries and get that code taken down to protect the community. But we usually see a bit of a lag time there, anything from days to weeks, just because they're, they're often run by volunteers. And so, you know, today the best way to protect yourself is to use a tool like Socket, which can, you know, provide that added layer of security across all the different registries.
And then I'll just add one other note too, which is kind of interesting, I think, which is that there's cases where even if the registry were to partner with Socket and we help them take down everything we've identified as malicious right away and provide that data to people. There is this kind of gray area of packages that we're starting to see more of where they're effectively malicious from a company's perspective, but for whatever reason, they're still live on the registries.
I can give you a really concrete example. This one's from about a year ago where a maintainer wanted to protest the war in Ukraine. So they added code into their package. That would, it would embed itself into the front end of your website. And that client side JavaScript, when it ran, it would check your time zone, and if you were in a certain range of time zones, mostly in Eastern Europe, Then it would pop open a new tab in your browser.
It taking you to their, to their website to basically sign a petition. That doesn't sound that bad. Right. But the, the thing is like this package was really widely used 600, 000 weekly downloads. It was a popular library used to basically backfill functionality for older browsers. And so this was used in a lot of apps that need to support like old browsers, which, you know, you can imagine like that's banks and those types of organization.
And so we saw this get bundled into people's front end. And little do they know, because they might not have developers in those time zones, they're actually running this code on their customers, on their, you know, on their users devices. And it looks like they're hacked, when you come to the website, because they're, you know, you're getting redirected.
But, so it's, it's sort of an image issue. And for whatever reason, you know, in that instance, GitHub decided that that isn't actually something that they want to take down, that they don't consider to be malicious. Even though I haven't talked to a single company that's, that's okay with that in their code base.
And so, that, that package remains alive on, on npm today. And so, you need, you know, third parties like socket to come in and sort of say, Okay, yes, this is technically not considered malicious by github or by npm, but actually, nobody wants this. So, you know, we flag that as, we call it protestware, or potentially unwanted behavior present in the package.
And we can stop that from getting in the first place, or identify it if it's already, you know, used in, in any of your repositories.
Tom: Yeah, yeah, that's an interesting example. I do remember hearing about it, but I didn't realize that it remained live.
Feross: Yeah, the package is called EventSource Polyfill, if anyone wants to take a look at it.
It's one of the older versions that they have up there today and still hosted there. So, so that's the kind of thing I think that, that you know, where you, you, you ask yourself like, okay, you know, what should we do about this as an, as an industry, right? And. I would argue there should be disclosure, right?
There should be, I mean, the CVE system is nice because everyone is using it in some ways, right? We're all hooked into it, we all have, you know, we've done a pretty good job as an industry of getting adoption of of that data source, so the path forward here might be to put this type of data into the NVD and sort of expand its scope.
Although, you know, I don't know all the, you know, there may be other considerations that, you know, we need to think about there, but I just think it's such a widely deployed system. So yeah, it might be the, it might be the path of least resistance to get this information that we have into more people's hands.
Tom: Feross Aboukhadijeh, CEO and founder of Socket thanks a lot for another fascinating interview.
Feross: Yeah. Thanks Tom. Glad to be here.
Subscribe to our newsletter
Get notified when we publish new security blog posts!
Try it now
Ready to block malicious and vulnerable dependencies?
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.