SQLite is 138x Slower Than This?! (Testing Stoolap)

BBetter Stack
컴퓨터/소프트웨어AI/미래기술

Transcript

00:00:00When you're spinning up a new project and you need a database, what's the first option that
00:00:03comes to mind? SQLite? It's probably SQLite, right? I mean, it's great and it's reliable,
00:00:09it has zero config, it's an industry standard. But as our local data gets heavier and our queries
00:00:14get more complex, we're starting to hit that ceiling of what a single threaded file locking
00:00:20engine can do. But now there's a new player in town that's trying to solve these issues.
00:00:25It's called Stulab and it's a database engine written entirely in Rust and it just released
00:00:31a native Node.js driver that is quite frankly showing some really strong performance. The
00:00:36database is 138 times faster than SQLite. So in this video, we're going to take a look under the
00:00:43hood of Stulab, see how it works and run a live benchmarking test to see if it's really as powerful
00:00:50as it claims to be. It's going to be a lot of fun, so let's dive into it. So what exactly is Stulab?
00:01:00Well, at its core, Stulab is an embedded OLAP or online analytical processing database. Now if
00:01:06you're used to standard databases like SQLite or Postgres, those are typically OLTP or online
00:01:14transaction processing databases, which are, as the name suggests, optimized for transactions. But
00:01:20Stulab is different. It's designed for analytical workloads and it's built from the ground up in Rust
00:01:27focusing on high-speed data processing. Think of it as having the portability of a SQLite file,
00:01:33but with the raw analytical power of something like a DuckDB or BigQuery. But the coolest thing
00:01:39is that you can now use it with Node.js thanks to its native Node driver. Now when I say it's
00:01:45a native driver, I'm not talking about some standard wrapper. Usually when a database is
00:01:49written in a different language like Rust or C++, your Node.js app has to talk to it over a bridge.
00:01:56Often that means converting your data into JSON or some other format, sending it over a local network
00:02:02socket and then converting it back on the other side. That is called serialization overhead and
00:02:08it's a massive performance killer. But Stulab Node does things differently. It uses NAPI-RS,
00:02:15which is a framework that allows the Rust engine to be compiled into a native binary that loads
00:02:21directly into your Node.js process. So there's no bridge and no translator in between. When you send
00:02:27a query, Node.js and Rust are essentially sharing the same memory space. And there are three big
00:02:33reasons why Stulab is blazingly fast. First, it's using MVCC or multi-version concurrency control.
00:02:40Unlike SQLite where a single writer can log the entire database, Stulab allows multiple readers
00:02:47and writers to work at the same time. Secondly, it uses parallel execution. Stulab uses a scheduler
00:02:53called Rayon. And with Rayon, when you run a massive query, instead of sending on one CPU core, it breaks
00:03:00that query apart and utilizes every core your machine has. And thirdly, it uses a cost-based
00:03:06optimizer. So it doesn't just execute your SQL blindly, it actually analyzes your data, estimates
00:03:13the cost of different paths, and picks the fastest way to get to your results. So that's what supposedly
00:03:19makes Stulab a much faster option than SQLite. But let's test it out to see if that's actually true.
00:03:25For this test, we'll be using a simple Node.js project and we will install both Stulab and
00:03:30SQLite as dependencies. So one of the biggest wins where Stulab really shines is using count distinct.
00:03:37So I'm really curious to know if that really is the case. So I set up this simple script where we spin
00:03:43up an in-memory version of each database and create a symbol sales table. We then populate this table
00:03:49with 10,000 rows of random sales data where each row represents a sale from a user whose ID can vary
00:03:56from 0 to 1,000 and which has one of the specific categories. We will then batch insert the data in
00:04:03both databases and then run the benchmark where we select a distinct count of sales made by a specific
00:04:10user on a specific category and calculate the performance of each database. Now I do have to
00:04:16note that it is a bit frustrating that as of now the package installed does not work. If we run the
00:04:22benchmark test now, we see that it's complaining about a missing native binding. So the project's
00:04:28author clearly forgot to add the binaries or link the correct ones to the package. So what I had to
00:04:34do is build it from the source. So I cloned the repo, ran the build inside of it and then hopped
00:04:39back to my benchmark project and linked my source directory as the dependency. This is a bit
00:04:44frustrating at the moment, so I hope the authors of the project will fix this in the future. But
00:04:49nonetheless, after doing so, we're now finally able to run the benchmark. So let's do that now.
00:04:54As you can see, the count distinct operation is indeed a lot faster on Stulap, although not as
00:05:01much as advertised. It's only four times faster. So what if we add another zero to the number of
00:05:07data we want to populate and then run the test again with 1,000,000 rows? So let's try that now.
00:05:12So even for a million rows, Stulap is only six times faster, not 138 times. But nevertheless,
00:05:20it's still a great result. So that's the count distinct test. I decided to do another benchmark
00:05:26test to test out the distinct + order by operation. And in this second test, I did a setup where we
00:05:33ingest some random logs with different IP addresses and status codes, and then try to find distinct
00:05:39logs by IP address and status code pair, and then order them by IP ascending and status code
00:05:47descending. And as you can see, once we run this test, Stulap still performs better than SQLite,
00:05:53but not by 14 times, just 1 to 1.5 times faster. So the measurements listed here are a bit inflated,
00:06:01in my opinion. But nevertheless, Stulap is indeed faster than SQLite, as we just saw in the tests.
00:06:08To be fair, the author also mentions areas where SQLite still performed better than Stulap. And
00:06:13these are mostly situations where you perform single row operations. So Stulap is great for
00:06:19analytical and complex queries. So is Stulap the SQLite killer? Well, honestly, no. They're built
00:06:26for totally different things. SQLite is still your reliable daily driver for transactions,
00:06:32but Stulap might be your high performance race car for data analysis. But the fact that we now
00:06:38have a pure Rust analytical engine that we can now drop into a Node.js project with nappy RS
00:06:45is pretty great. The project owners just got to make sure they patch their current NPM package,
00:06:50so we don't have to build it from source. So there you have it, folks. That is Stulap in a nutshell.
00:06:55But what do you think about it? Is the performance boost worth switching to Stulap, or are you sticking
00:07:01with the reliability of SQLite? Let us know in the comments down below. And folks, if you found this
00:07:06video useful, please let me know by smashing that like button underneath the video. And also don't
00:07:11forget to subscribe to our channel. This has been Andris from Better Stack and I will see you in the
00:07:17next videos.

Key Takeaway

Stoolap is a high-performance, Rust-based analytical database for Node.js that outperforms SQLite in complex queries but serves as a specialized tool for OLAP rather than a general-purpose replacement.

Highlights

Stoolap (Stulab) is an embedded OLAP database engine written in Rust with a native Node.js driver.

The engine uses NAPI-RS to share memory space with Node.js, eliminating serialization overhead common in other wrappers.

Key performance features include Multi-Version Concurrency Control (MVCC), parallel execution via the Rayon scheduler, and a cost-based query optimizer.

Real-world benchmarking showed Stoolap is 4x to 6x faster than SQLite for 'count distinct' operations on large datasets.

While faster for analytical queries, the speaker notes that SQLite still outperforms Stoolap in single-row transactional operations.

Current implementation hurdles include a broken NPM package requiring users to build the native bindings from source.

Timeline

Introduction to Stoolap and the Performance Gap

The speaker introduces the limitations of SQLite, specifically its single-threaded nature and file-locking engine which can hinder heavy local data processing. He presents Stoolap as a new Rust-based database engine that claims to be up to 138 times faster than industry standards. This section highlights the growing need for more powerful local database options as project complexity increases. The host promises a deep dive into the architecture and a live benchmarking session to verify these performance claims. This sets the stage for comparing traditional embedded databases with modern analytical engines.

OLAP vs. OLTP and Technical Architecture

This section explains the fundamental difference between OLTP databases like SQLite or Postgres and OLAP engines like Stoolap. While OLTP is optimized for transactions, Stoolap is designed specifically for analytical workloads similar to DuckDB or BigQuery. A major technical highlight is the use of NAPI-RS, which allows the Rust engine to load directly into the Node.js process. This architecture avoids the 'serialization overhead' caused by converting data to JSON or communicating over local sockets. By sharing the same memory space, the database achieves significantly lower latency during query execution.

The Three Pillars of Stoolap Speed

The speaker identifies three core technologies that drive Stoolap's performance: MVCC, parallel execution, and cost-based optimization. MVCC allows multiple simultaneous readers and writers, overcoming the restrictive locking mechanisms found in SQLite. Parallel execution is managed by the Rayon scheduler, which distributes massive queries across all available CPU cores instead of a single core. Furthermore, the cost-based optimizer analyzes data paths to pick the most efficient execution strategy rather than running SQL blindly. These features collectively aim to maximize hardware utilization for complex data sets.

Benchmarking Setup and Installation Challenges

To test the claims, the host sets up a Node.js project to compare Stoolap and SQLite using a 'count distinct' operation on a sales table. However, he encounters a significant issue where the official NPM package is missing necessary native bindings, leading to a 'missing native binding' error. To bypass this, the speaker demonstrates cloning the repository and building the source code manually using Rust tools. This segment serves as a practical warning for early adopters about the current state of the project's distribution. Despite these friction points, he successfully links the built library to proceed with the performance tests.

Live Test Results and Comparative Analysis

The live benchmark reveals that for 10,000 rows, Stoolap is 4 times faster than SQLite, and this grows to 6 times faster when testing 1,000,000 rows. A second test involving 'distinct and order by' operations on log data showed a smaller lead of roughly 1 to 1.5 times faster. The speaker notes that while the results are impressive, the '138x faster' marketing claim appears to be somewhat inflated for these specific scenarios. He concludes that while Stoolap is clearly superior for bulk analysis, the performance gap varies significantly depending on the query type. These concrete numbers provide a grounded perspective on what developers can actually expect in production.

Final Verdict: Is it a SQLite Killer?

In the final section, the speaker clarifies that Stoolap is not a 'SQLite killer' because they are built for different purposes. SQLite remains the reliable choice for standard transactional work and single-row operations where it often still outperforms Stoolap. Stoolap is better characterized as a 'high-performance race car' for data analysis within Node.js environments. The host expresses excitement about the potential of Rust-based engines but urges the maintainers to fix the installation packages. He ends by asking the audience if the performance gains are worth the current complexity of the setup.

Community Posts

View all posts