Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Employee Spotlight
Philipp Burckhardt
August 1, 2024
When I was first asked to pen a post for the Socket blog about my journey, I faced an uncomfortable question: Where to begin? How to avoid boring the dear reader? With no grand career plan of my own or sage advice in store, I found myself pondering instead: How much do small childhood experiences and distinct decisions, like the butterfly effect, ultimately shape our careers?
As a kid, computers fascinated me. I was glued to my parents' personal computer playing "The Settlers," a classic PC game. Germans have a penchant for strategy games, and I fit the stereotype; although beer, cars, and soccer never interested me. But I loved building kingdoms, optimizing resources, and conquering lands.
For a few years, I attended a Waldorf school, known for their disdain for technology. Yearning for a more structured education, I later found more at home at a classical school with an education rooted in Greek and Latin. If these wildly diverging poles taught me one thing, it is to dream with open eyes, as T.E. Lawrence might say, and dare to make those dreams possible.
Those early experiences, which taught me that there is more than the status quo laid the foundation for my future path. It's perhaps no surprise, then, that Socket’s mission resonated on a deeper level: Having the opportunity to work on cutting-edge security solutions that protect people and organizations around the world and being part of a company that strives to go beyond the old ways of doing things in the infosec space.
But my first exposure to open-source software and writing code on the daily wouldn’t come for many years. While I dabbled with video game development and basic HTML while in school, I had no idea what career to pursue. Excelling in most school subjects (save for sports) and faced with the decision of enrolling for a specific subject when applying to university, I ended up picking economics and business. It was the time of the financial crisis, 2007-2008, and I was imagining that this would be an exciting time to study this field.
I didn’t enjoy it much. At least the overly unrealistic toy models that are presented as gospel (there is a reason the field gets mockingly derided as suffering from physics envy). But I got to take linear algebra, real analysis, and to make my first foray into academic research by writing a thesis with Melanie Schienle on using quantile regression for risk modeling of the financial sector. Whereas ordinary linear regression models predict the conditional mean of a distribution given a set of inputs, quantile regression focuses on the quantiles of the distribution, which includes its extreme values. This experience rekindled my interest in coding, and I learned the statistical programming language R.
Other things studying economics taught me: thinking in marginals rather than absolutes, a perspective shift I highly recommend. And I fondly recall reading about Christopher Latham Sholes, the inventor of the typewriter in the fabulous book “Information Rules” by Hal Varian and Carl Shapiro. Scholes’ QWERTY typewriter layout, designed to prevent mechanical jams (and, at least that is the folklore, so his sales people could type the word “typewriter” by merely using the typewriter's first row of characters), has endured for over a century and is still used in modern keyboards, despite more efficient alternatives like Dvorak.
This anecdote isn't just about a keyboard layout; it’s a metaphor for technological inertia and the enduring influence of the status quo. The technical term is "switching costs," a reality I've faced multiple times in my career so far, as you’ll see shortly.
My foray into econometrics (a fancy word for statistics applied to economic data) led me to pursue a Master’s degree in Statistics at the University of Oxford. There, I ended up working on probabilistic topic modeling to model the emergence of topics in newspaper articles and corresponding discussions on social media, specifically the platform formerly known as Twitter. A research project with political scientist Ray Duch involved collecting lots of Twitter data via their API (back then there were generous rate limits, so I wasn’t constrained in doing so) and dumping it into mongoDB, which as a document store provided a suitable tool for saving lots of nested data without a strict schema.
It was at this time in 2012/2013 that I discovered Node.js. Confronted with the task of scraping publicly available newspaper articles from their websites, I found the R eco-system severely lacking in this area (things have changed, mainly through the tireless work of the folks at Posit, formerly RStudio, in pushing the R ecosystem forward.)
At the same time, I was intrigued by YQL (Yahoo Query Language), which treated the entire web as a database that could be queried. YQL was truly ahead of its time, offering a unified interface for accessing diverse web data sources and facilitating data mashups.
However, for my specific needs, there were no good libraries for DOM manipulation in R at the time, whereas Node.js had JSDom and cheerio. Fun fact: Eli Insua, the creator of JSDom, is now my colleague at Socket, and I am regularly amazed by his exceptional 3D visualization skills.
Node.js really opened a new door for me: No longer was JavaScript restricted to the confines of the browser, but now it was possible to use it like I had been using R. But soon I found myself in a quagmire: Wanting to calculate various statistics on the scraped data, I found that the Node.js ecosystem didn’t have the advanced statistical libraries I had grown accustomed to. So I started writing my own and putting them on GitHub.
And here another chance encounter happened, which sparked my meanwhile twelve year run of contributing to the open-source ecosystem: Athan Reines, then a data scientist at server infrastructure startup NodePrime, a Node.js shop where he had been writing dashboards and statistical tools in JavaScript, reached out and we ended teaming up. After a few attempts, this joint effort culminated in the creation of the stdlib project in 2016, which aims to provide the fundamental numerical infrastructure for the web.
Primarily written in JS and C, the library consists these days of more than 150+ special math functions, probability distributions, and 200+ general utilities. Its 3,800+ packages are all individually consumable due to its fully decomposable architecture.
Looking back, we underestimated the scale of the endeavor to build out infrastructure similar to Python's NumPy and SciPy. These language eco-systems have evolved over a long period of time, and in turn rely on numerical libraries written over dozens of years (written primarily in Fortran and C/C++.)
At this point in time, I also crossed paths for the first time with my current co-worker Mikola Lysenko, who would have been my mentor for a Google Summer of Code (GSoC) project on compiling linear algebra libraries from C to JavaScript with Mozilla’s emscripten, which would have graciously been done under the umbrella of the jQuery foundation. The foundation didn’t get awarded enough slots, so that particular project didn’t come to fruition, but I ended up working with Mik and other smart folks for some time on various numerical libraries for Node.js under the scijs umbrella.
It's a cliche at this point, but in open-source it is true that we are all standing on the shoulders of giants, on all the work that has been done in the past. This is an amazing super power, and it has made it possible for developers and organizations to be more ambitious in their projects and accomplish much more with limited resources!
Today, stdlib has an amazing set of contributors. This summer, we were accepted to the GSoC program for the first time, and currently have four students finishing up their work on projects that push the project forward, be it by porting BLAS/LAPACK to JS or by super-charging the stdlib REPL environment.
This has been a long journey, and we are nowhere near completion of our mission. Social media so often gives a skewed perspective: Of course it is possible to achieve overnight success by being at the right place during the right time or through sheer luck, but I firmly believe that most great things take years of hard work and grit. Switching costs and the inertia of current solutions means that new incumbents will have to offer good reasons for potential users to give them a try. Many people give up prematurely.
After my time at Oxford, I pursued a PhD in Statistics & Data Science at Carnegie Mellon University. As a continuation of my prior work with natural language processing and statistical modeling of unstructured text, I ended up working on various health–care informatics projects revolving around Chronic Kidney Diseases with Rema Padman and Daniel Nagin.
In an effort to combine my academic research and work on open-source Node.js libraries, I ended up working on developing a new e-learning platform for teaching statistics and data science under supervision of Christopher Genovese and Rebecca Nugent called ISLE (Integrated Statistics Learning Environment). That I could work on rather unusual projects during my PhD is surely due to the special place that is CMU, a small campus community that is used to collaboration across disciplines and open to technological approaches due to the reputation of its computer science department and the legacy of pioneers such as Herbert Simon, who made significant contributions in eclectic fields including artificial intelligence.
The freedom to explore uncharted territory is often associated with academia, but I am proud to say that we at Socket aim to be at the forefront of both operationalizing existing research and approaches, but also investing in research to help secure software supply chains: Be it through our groundbreaking AI scanning technology and other ambitious projects, or through actively supporting industry and academic collaboration, including our awesome research interns.
This summer, we have two very talented PhD interns at Socket, Wenxin Jiang and Hao Heo. In a way, things have come full-circle for me here, since Hao is pursuing her graduate studies at CMU under the supervision of Bogdan Vasilescu.
After finishing my PhD, I stayed on at CMU first as a postdoc, then as Director of E-Learning at the Department of Statistics & Data Science. ISLE combined educational platform development with research on the "science of data science," aiming to enhance the learning experience for students in these fields. The platform delivers instruction through video and interactive learning content. ISLE is designed as a web-based e-learning platform and lesson authoring framework, allowing educators to create and customize content for their students through various building blocks (for which relying on React.js as its front-end library proved a great choice.) Most challenging and rewarding was building a fairly complex system for real-time collaboration (think of Google Docs for writing statistical reports with graphs etc.) and monitoring with Socket.IO.
One of the biggest use-cases of ISLE at CMU has been the Moderna AI Academy, a partnership between Moderna and Carnegie Mellon University (CMU) that was launched in December 2021. It aimed to educate Moderna employees on AI capabilities, skills, and best practices. The program was designed to enhance AI knowledge across the company, supporting Moderna's efforts to extend its mRNA technology development beyond COVID-19 vaccines, and was delivered using ISLE.
While I learned a great deal in single-handedly building ISLE from the frontend to the backend and anything in-between, and it was exciting to grow the project to being used by thousands of students at a range of different institutions of higher learning, academia can also at times be a lonely place. As I had to learn time and time again, inertia is a powerful force and any institution is bound to be stuck in its way. That’s the great promise of start-ups, which are able to reimagine things from the ground up.
I had known of Feross, Socket's CEO, through his work in the Node.js ecosystem (be it StandardJS or WebTorrent,) and, being both intrigued by the mission of the company and seeing some other familiar faces, I decided to take the plunge from academia to industry. It was a great decision for me personally: the opportunity to work in a talent-dense environment and sharpen my skills through daily learning from my co-workers has been exceptional, and I am so excited for what we will accomplish in the future. There is still lots to do and the security landscape is shifting every day.
What I've found particularly exciting about working at Socket is how it embodies the startup ideal of breaking down traditional role boundaries and a flat hierarchy, which I know Feross strives to maintain as we continue to grow (we are hiring, so check out our careers page!) It has been great for me personally to be able to switch between my data scientist and full stack engineer hat.
The collapsing of specialized roles allows for a more holistic approach to problem-solving, leading to innovative solutions that might be missed in more siloed environments. This interdisciplinary approach is not just professionally rewarding, but I believe it's crucial for tackling the complex challenges in today's software security landscape.
Today, I'm proud to be a part of the Socket team. I'm excited about the work we're doing to build tools that make it easier for developers to identify and fix security vulnerabilities in open-source software. The amount of software being published every day is staggering, as is all the malware and code anomalies that we see getting flagged by our AI scanner every day. As software supply chains become increasingly complex, Socket's ability to safeguard businesses against vulnerabilities in third-party dependencies could become crucial for maintaining security and trust in the software ecosystem.
Let’s work together to make the open-source ecosystem a more secure place!
Subscribe to our newsletter
Get notified when we publish new security blog posts!
Try it now
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.