TanStack & MANY more packages affected - a deep dive & analysis
MMaximilian Schwarzmüller
컴퓨터/소프트웨어경제 뉴스AI/미래기술
Transcript
00:00:00We got another big, a really big supply chain attack going on right now and it's still ongoing
00:00:06and it spread from the NPM also to the Python ecosystem so maybe right now don't install any
00:00:12NPM or Python packages. And make sure your system is set up securely in general. I got another video
00:00:19on that, I'll share a link below and I'll get back to it here in this video too, but first I want to
00:00:23give you some details on what is affected and to find out if you're affected. It started with the
00:00:30TanStack packages, TanStack query, TanStack router, TanStack start and so on. Yesterday, May 11th,
00:00:36in a pretty short time frame a couple of malicious packages or actually all TanStack packages were
00:00:43published with malicious versions and it was contained quickly within 20 minutes. In the end
00:00:50it was detected and contained quickly but all these malicious packages have been published in that
00:00:57time frame or in that short time frame here. And then it continued spreading and it still is
00:01:03spreading. It spread to the Mistral packages which ha ha only have four users but still that was
00:01:09affected because this malware acts as a worm and steals data, steals credentials, also potentially
00:01:16yours if it is installed on your system. I'll get back to how to find out if you're affected in just
00:01:20a second but it continued spreading to more NPM packages because that is the idea behind it and
00:01:26then even into the Python ecosystem and that's happening right now. This is just a few hours old
00:01:32here, two hours at the point of time where I'm recording this. Now how do you find out if you're
00:01:39affected? If you installed any TanStack package yesterday evening in my case here in the German
00:01:45time zone, you must consider yourself affected. If you installed it around that time, keep in mind
00:01:54that is UTC so you have to translate that to your time zone, in that time then you must consider
00:02:00yourself affected. But since it's spreading to the Mistral packages, to many more JavaScript packages,
00:02:06more than I could list here, you also must consider yourself or your machine affected and compromised.
00:02:13And I'll share links to these posts below so that you can dive deeper and see the full list
00:02:18of all these affected packages as they have been published. But as mentioned, it's still ongoing,
00:02:22so maybe don't install anything right now. There also are indicators of compromise. You want to
00:02:31look for certain file hashes, SHA hashes, for the router in a JS file. I'll also link this post below.
00:02:38And if you have a way of monitoring which network request happened on your machine,
00:02:42you want to look for outgoing traffic to this URL, which would be another clear indicator
00:02:48that data has been exfiltrated from your system. What does compromise mean in detail? It means that
00:02:55this malware doesn't yet two main things. The first important thing it does is it harvests data. It
00:03:03looks for NPM tokens, GitHub tokens, AWS credentials, other secrets. So it scans your system for
00:03:12typical locations where you store credentials and secrets, and it collects them and sends them off to
00:03:18this URL I showed you. So it steals these secrets. But it does not just do that. As I mentioned,
00:03:26it acts as a worm, so it also uses these GitHub tokens that were stolen. For example,
00:03:33it uses them and the NPM tokens to publish more compromised packages. If you're the maintainer
00:03:40of another package, or if maybe you have a CI/CD workflow that ran in that timeframe and that
00:03:46depended on some TanStack packages, then in that CI/CD workflow, the malicious, the compromised
00:03:53TanStack packages have been pulled in. The malicious code may have executed in there. And then in that
00:04:00workflow, so not on your machine, but in that workflow, it also could steal certain credentials
00:04:06to publish a malicious version, a compromised version of the package your CI/CD workflow was
00:04:14trying to build. So that is how it spreads. As I mentioned, it acts as a worm. It's using these
00:04:20stolen credentials and tokens to publish more compromised packages. And that is how it spreads
00:04:26to Mistral and then also other JavaScript packages, and then even into the Python ecosystem. And this
00:04:32is where we are right now. And it still is spreading for all I know. Now, how can you protect against
00:04:39that? I created a video about that on my other channel, AkataMind. I'll also link it below.
00:04:44The short story is you want to make sure that you run your code or that you do your development,
00:04:51not directly on your root machine if possible, but instead in some virtual machine in a dev container,
00:04:57something like this. You don't want to store raw secrets on your machine. I mean, for AWS,
00:05:03for example, you want to use their single sign on approach instead of storing IAM credentials on
00:05:10your machine, for example, and use similar techniques for other services you may be using. In addition,
00:05:16you also want to consider using services like InPhysical or Doppler for storing your secrets
00:05:25in the cloud and not on your hard drive, not in .env files. That is something you may want to do.
00:05:30And again, I talk about stuff like that in that video. And you also want to use package managers
00:05:38and configurations that allow you to configure stuff like the minimum release age, like Bun allows
00:05:44you to do. In the bunfig.toml file, you can set a minimum release age, which makes sure that even if
00:05:49you do run bun install, you only install packages that are at least X seconds old, X days old in this
00:05:56case here in this example. Now, pnpm has a similar feature. The latest versions of npm have a similar
00:06:02feature. Again, I covered that in that other video. And if you do use something like Bun or if you have
00:06:09the right configuration for npm, but Bun does it as a default, for example, then it also blocks the
00:06:15execution of, for example, post install scripts, so lifecycle scripts of those packages you're
00:06:21installing, which gives you another security mechanism because that malware typically relies
00:06:28on such scripts being executed on your system. So using a secure package manager and or secure
00:06:36configuration for that package manager, running your code in a virtual machine or a dev container
00:06:41and not storing plain secrets on your system. That is what you want to do in general, but now maybe
00:06:46even more because those attacks, attacks like these here will just become more serious and we'll dive
00:06:52into how that attack work because it's really interesting. But of course, we're having more of
00:06:58these. I create a video like this almost every month now or maybe even more frequently, because for one,
00:07:04I believe they are easier to pull off. Now in the age of AI, it's easier to analyze the packages or
00:07:12the dependencies you want to affect and analyze their source code or their CICD setup for potential
00:07:22attack vectors. That is what happened here for TanStack. It's not as if a maintainer's machine
00:07:28was affected, but instead it was the TanStack CICD workflow that has been attacked. And I'll
00:07:34get back to that. So it's easier to look for vulnerabilities with AI. It's easier to write code,
00:07:40malicious code included, of course. And at the same time, we got that explosion software. We got more
00:07:45software being written than ever before. So there are more targets out there, including many targets
00:07:51that maybe don't care too much about security. So that makes those attacks more interesting too.
00:07:57Now, how did this all start? It's really interesting. As I mentioned, not a novel approach,
00:08:03not one we never saw before, but still quite elaborate. The TanStack team published a post
00:08:09mortem, an article where they explain how the attack happened. And I'll link that below too.
00:08:15But of course, I'll give you the summary here because in the end, this attack here relied on
00:08:22three main steps, which I'll explain in detail. A pull request target pawn request pattern. I'll
00:08:30explain what that is. Then GitHub actions cache poisoning across the fork based trust boundary
00:08:38and runtime memory extraction of an OIDC token. Okay, what does that all mean? Again,
00:08:45you can read the article for all the details, but let me give you the summary. And let's start with
00:08:50the pull request pawn request pattern. What is that? In order to understand that we have to understand
00:08:58that GitHub actions is of course the CI/CD solution, the CI/CD product by GitHub. And I do
00:09:05have a course on GitHub actions too, by the way, if you want to learn how to set up GitHub actions,
00:09:10how to use the product for CI/CD tasks, how to publish your packages or your website and so on.
00:09:16Now, like all CI/CD workflow tools, GitHub actions relies on events that trigger workflows because of
00:09:24course CI/CD is all about doing something in an automated way. For example, releasing your website,
00:09:29publishing, deploying your website in an automated way when you push to the main branch, for example.
00:09:34So you have various events that can trigger a workflow and push is one event, for example,
00:09:40so that you can say, okay, if I push to the main branch, for example, you can filter for
00:09:44different branches. Then I want to execute certain tasks. I want to install my dependencies. I want to
00:09:49build the project. I want to upload it to my server. That is what you could do. Now, one other trigger
00:09:56is pull request target. This trigger activates if there has been a pull request opened for your
00:10:05repository. And that of course means anybody can fork your repository, do something in there, push
00:10:14something in their fork, and then open a pull request with your repository. And that would trigger
00:10:19this workflow. Sounds dangerous? Well, it kind of is. And it is what started this attack.
00:10:25There also is the pull request trigger. So I talked about pull request target before,
00:10:31but we also have pull request, which works in the same way, but pull request then runs the CI/CD
00:10:38workflow in the context of the forked repository. So whatever malicious may be going on in there,
00:10:45it happens in the forked repository, not in the base repository. So this is not a problem.
00:10:52Pull request target on the other hand runs in the context of the base repository. And that of course
00:10:58is potentially dangerous. It's potentially dangerous because anybody can open a pull request. And of
00:11:04course, what happened in this case here for the TanStack attack, in that pull request, in that
00:11:10fork, the attacker included the malicious code, the worm code, the malware in the TanStack repository,
00:11:20in the fork of it, but it included it in there. Then the attacker opened the pull request,
00:11:26and that led to pull request target being executed. And then as mentioned, that then spins up a GitHub
00:11:33actions runner, and it then runs in the context of the base repository. What does this mean?
00:11:40This does not mean that the attacker gets access to the base code or can merge the malicious code
00:11:46into the repository, but it means that, for example, the cache that's being used in there
00:11:53will be shared with subsequent GitHub actions executions that stem from the base repository,
00:12:00potentially from totally different hooks or event triggers like the push trigger.
00:12:05The next thing that happened was the cache poisoning. But what does this mean? Well,
00:12:11the attacker added code to their fork that would make sure that when the GitHub action
00:12:17ran for the pull request target trigger, it would run a command, the hash files command,
00:12:23which is supported by GitHub actions, to store something in the GitHub actions cache. Now,
00:12:28what is that cache about? The idea behind the GitHub actions cache simply is to speed up
00:12:33those GitHub action workflows. So you can, for example, hash dependencies. The idea being that
00:12:39if a dependency your package depends on hasn't changed, why would you go through the entire
00:12:46installation process again? That just takes time, and time is money because you're billed for the
00:12:52runtime of your GitHub action workflow. And of course, you don't want to have workflows that
00:12:56take forever. So in most workflows, of course, for example, when building the 10 stack packages,
00:13:00you install the dependencies of the 10 stack packages, and then you do the build step and build
00:13:06your 10 stack package. Again, if those dependencies of 10 stack haven't changed,
00:13:12why reinstall them? That's the idea behind caching. And that makes sense. Of course, the problem is,
00:13:18since that pull request target GitHub actions execution and other GitHub action executions,
00:13:24like the ones for the push trigger, share the same context, they share the same cache. And that is
00:13:31where cache poisoning comes in, because the attacker was able to cache a malicious version or to put
00:13:39that malicious code into a dependency of 10 stack, so to say, and cache that. So then the attacker
00:13:46just had to wait for a normal GitHub actions workflow to run for the 10 stack packages.
00:13:53So for some maintainer to push some code, and then that other GitHub actions execution would reuse
00:14:01the same cache that was set up by the malicious execution before, and would now pull in the
00:14:08prepared poisoned cache, which included the malicious code. So that is how the malicious code
00:14:13got from the fork into the normal GitHub actions execution for a normal push by a normal maintainer
00:14:21who has not been affected by any malicious code. That is how the cache was used as a transport
00:14:28vehicle between these two GitHub action executions in the end. And then as a third step, once the
00:14:35malicious code made it into a regular execution of a 10 stack CI/CD workflow, because of that push
00:14:44event, it stole a short-lived NPM token, an OIDC token in the end, to publish a malicious version
00:14:54of the 10 stack package. Now, what am I referring to here? NPM has that feature, which is called
00:15:00trusted publishing, which in theory makes publishing NPM packages more secure, because
00:15:04there are roughly two ways of publishing a package to NPM, you could say. One is that you create a
00:15:11token with your NPM account and you use that to publish new versions of your package. The problem
00:15:19is if that token gets stolen, anybody can publish a new version of that package. To ramp up the
00:15:26security, there is this trusted publishing process where NPM says no, you can't publish packages from
00:15:33your machine, you have to go through one of these trusted providers, GitHub actions being one of
00:15:37them, and there is a trusted publishing integration for GitHub actions, which you can set up. And then
00:15:44as part of that trusted publishing process, a short-lived publishing token will be retrieved
00:15:50or will be requested. And then that short-lived token will be used for signing that new package
00:15:57version that is being published. So in theory, the idea is that the token is hard to steal because
00:16:03it's not on the machine by any maintainer. And in addition, it's short-lived. Even if it were stolen,
00:16:08it's not active for very long. The problem, of course, just is if the code that runs in the CICD
00:16:15workflow that is requested that trusted token, if that code has been affected, then that malicious
00:16:21code has access to this brand new short-lived trusted publishing token. And that is what happened
00:16:27here. So that malicious code abused this token or used this token to then publish a new version
00:16:36of the TanStack package. Now, interestingly enough, this attack actually failed a bit because it did
00:16:44get that trusted token and it did use it to then reach out to the NPM API to publish a new version
00:16:52of the TanStack package that included this warrant, that included the malicious code. But it actually
00:16:58ended up in a GitHub actions workflow that failed to complete because there was something wrong in
00:17:06the code that was pushed to CICD. So if the attackers would have paid attention to run their
00:17:12attack at a point in time where a valid code would be pushed, then of course this workflow would have
00:17:19completed and they would not have had to rely on publishing a malicious package manually by reaching
00:17:26out to the NPM API, but instead they could have injected the malicious code into this workflow as
00:17:32they did, let this workflow finish successfully and then a compromised version of TanStack would have
00:17:38been published whilst looking very valid because it was a normal push by a maintainer and the workflow
00:17:45finished successfully. The way this attack worked because that workflow did not finish successfully
00:17:51made it a bit easier to to catch what was going on by an external contributor here in the end
00:18:00because you could see that a new version of the TanStack packages was published even though the
00:18:05GitHub actions workflow failed, so no new version should have been published. So you could see a
00:18:12mismatch there which made it a bit easier to detect this attack which is kind of one part where the
00:18:19TanStack maintainers and we all got lucky. Nonetheless a pretty elaborate attack as you can
00:18:26probably tell that worked totally without compromising anybody's machine and even though it was caught
00:18:32quickly it did serious damage because as I mentioned it is still spreading and that was a long
00:18:41episode about all that I know but I really want to emphasize you have to work on making your system
00:18:49secure as I shared before as I share in this video you you want to make sure that you reduce the
00:18:56danger of being affected this attack again was caught quickly and yet it is still spreading
00:19:05so it's not over yet and it is possible that not all attacks will be caught that quickly
00:19:11in the future as mentioned they got a bit lucky here it could have been harder to detect this
00:19:18attack so then maybe the damage would be even greater but it's already pretty large here and
00:19:24it's not over yet and we'll see more such attacks I'm sure because I mentioned the the attack surface
00:19:31is getting bigger and more interesting there are more people writing code lots of people that
00:19:36don't know what they're doing and AI helps with running such attacks so yeah this is what's going
00:19:42on right now if you don't have to maybe don't install anything double check your setup and you
00:19:48find all the links below if you want to dive deeper if you want to see the full list of affected
00:19:51packages and so on
Community Posts
No posts yet. Be the first to write about this video!
Write about this video