Why would anyone build a database in Rust?
I've been asked this at least thirty times. Usually by people who think SQLite is fine, or who assume the answer involves blockchain. Neither is correct.
moteDB started as a robot's memory â a place to store sensor readings, camera frames, IMU data, and navigation state in a way that could survive a crash, answer cross-modal queries, and run on a Raspberry Pi without a server. The robot's Python stack was already fighting SQLite over file locks, and the 23ms write stalls were causing visible servo jitter at 200Hz.
This is the story of the technical decisions behind moteDB 0.1.6. Not the marketing version. The actual choices, the tradeoffs we made, and the things that bit us.
Why Rust
The short answer: because we needed a database with no runtime, no garbage collector, and a binary small enough to ship on an embedded device alongside the rest of the robot's software.
The long answer is more interesting.
Most databases are built in C or C++. That's a reasonable choice â you get manual memory control, mature tooling, and decades of optimization work. But C gives you exactly one thing in return for that control: the ability to get it catastrophically wrong. use-after-free, double-free, buffer overflow â these aren't edge cases in C database code. They're recurring occupational hazards.
Rust eliminates that class of bugs at compile time. More importantly for an embedded context, Rust does this without a garbage collector. There's no runtime pausing your real-time control loop to clean up heap memory. The cost of safety is paid at compile time, not at runtime.
There's a subtler benefit that's harder to quantify: Rust makes it safe to be aggressive with optimization. When the borrow checker is watching you, you can use unsafe blocks where needed for performance, wrap them in safe abstractions, and know the compiler will catch violations. In C, "I'll be careful here" is a hope. In Rust, it's enforced.
The other practical win is deployment. cargo build --release gives you a single static binary. No shared libraries to manage, no Python version conflicts, no JVM startup time. On the robot, the database is a 2MB binary that starts in 3ms. That matters more than it sounds.
The LSM-tree vs B-tree Decision
Once we decided on Rust, the next question was the storage engine. This is where most embedded database discussions get muddled.
SQLite uses a B-tree. B-trees are well-understood, read-optimized, and handle point queries efficiently. They're the right choice for most applications. They're also not ideal for write-heavy workloads with append patterns â which is exactly what sensor data looks like.
Sensor data isn't random updates to existing records. It's a stream of new entries, timestamped, appended to the end of a log. B-trees need to find the right leaf page, split it if necessary, update parent pointers, and propagate changes up the tree. That's multiple disk seeks per write, even with page caching.
LSM-trees (Log-Structured Merge-trees) take a different approach. Writes go to an in-memory buffer first. When the buffer fills, it gets flushed to disk as an immutable sorted run. Reads traverse the in-memory buffer and the most recent on-disk run, then older runs if needed. The key insight is that sequential writes are orders of magnitude faster than random writes on spinning disks and SSDs alike.
The tradeoff: LSM-trees are read-heavy compared to B-trees. A point query might need to check the in-memory buffer, the newest run, and potentially several older runs before finding the answer. B-trees guarantee O(log n) for both reads and writes; LSM-trees give you faster writes at the cost of slower reads that span multiple data levels.
For a robot, that tradeoff makes sense. Writes happen at 200Hz and need to be non-blocking. Reads are typically range queries over recent data (the last 10 seconds are hot) or point queries into specific indices. The two-tier architecture â hot in-memory buffer + cold on-disk storage â handles both patterns.
Here's what the write path looks like in practice:
pub fn insert(&self, key: Vec<u8>, value: DataEntry) -> Result<()> {
// 1. Write to WAL first â durability, not performance
self.wal.append(&key, &value)?;
// 2. Acquire write lock on memtable (RwLock, nanosecond contention)
let mut memtable = self.memtable.write();
// 3. Insert into in-memory BTreeMap (sorted, fast)
memtable.insert(key.clone(), Arc::new(value));
// 4. If memtable exceeds threshold, mark as immutable and spawn flush
if memtable.len() >= self.config.memtable_size {
drop(memtable);
self.trigger_flush();
}
Ok(())
}
The WAL is there because power loss mid-write is a real scenario on a robot. A sensor reading at t=5.2s is worthless if the database thinks it covers t=0 to t=5.1s. The WAL gets fsynced before we acknowledge the write; the memtable doesn't.
The Three-Hour Hang
v0.1.7 had a design that looked clean on paper: one B-tree, one RwLock protecting all reads and writes. Simple. Wrong.
Twelve concurrent DB instances, each running column index queries while a background thread built indexes. The background thread grabbed the write lock. Every query blocked. With twelve threads competing for one lock, the queue grew faster than it drained. The system was alive â threads running, memory allocated â but nothing was making progress. It looked like a hang. It wasn't even a deadlock.
The fix took six seconds of code change and about six hours of profiling to understand.
The new architecture has four components:
pub struct StorageEngine {
// In-memory buffer â writes hit here first
index_mem_buffer: RwLock<BTreeMap<Vec<u8>, Arc<DataEntry>>>,
// On-disk B-tree â reads only
generic_btree: RwLock<GenericBTree>,
// Serializes buffer-to-BTree migration
drain_lock: Mutex<()>,
// Tracks deleted keys to prevent resurrected data
tombstones: RwLock<HashSet<Vec<u8>>>,
}
When the memory buffer exceeds its threshold, it atomically flips to an immutable snapshot. The drain thread picks it up and builds the B-tree without blocking readers. New writes hit the new active buffer. The drain lock uses try_lock so writers never block â if the drain is in progress, writes just keep accumulating in the active buffer until it fills and triggers its own drain.
The result: column index benchmark runtime dropped from 3+ hours to 6.6 seconds.
Vector Index: The Memory Problem No One Talks About
moteDB stores vectors alongside structured data. The naive approach â load all vectors into memory, build HNSW, serve queries â doesn't work on embedded hardware. A 1M vectors x 384 dimensions x 4 bytes = 1.5GB. Most edge devices don't have that headroom.
Most vector databases use HNSW (Hierarchical Navigable Small World). HNSW is excellent â high recall, fast queries. It also builds an in-memory graph structure that scales with the dataset. For a robot running on a Raspberry Pi with 4GB RAM total, that's not viable.
moteDB uses DiskANN. The core idea: build an index that can be memory-mapped and served from disk with minimal I/O per query. The index structure is designed so that a query only touches the pages it needs, in the order it needs them, with a bounded number of disk seeks.
The tradeoff is indexing time. HNSW builds in minutes; DiskANN builds in hours on large datasets. But the build happens offline, on a dev machine with proper resources. The query-time memory footprint is bounded regardless of dataset size.
There's a more practical tradeoff that doesn't get discussed enough: approximate vs. exact nearest neighbor. Vector indices return approximate nearest neighbors, not exact ones. For semantic search over a knowledge base, 95% recall is fine. For robot navigation or precision manufacturing, it might not be. We made the recall threshold configurable because use cases differ, and we're honest about the tradeoff in the documentation.
The Multimodal Coexistence Problem
The hardest design problem wasn't any single index type. It was making them coexist without dragging each other down.
A naive approach: separate storage engines for each modality, one for vectors, one for structured data, one for time-series. This works until you need to query across them. "Find all camera frames where the force sensor exceeded 2N within the last 3 seconds, then rank by semantic similarity to this query image" â that's a cross-modal query that needs to touch three different indexes atomically.
moteDB stores all indexes independently but shares a common WAL. Each index type has its own flush and compaction logic, but they're coordinated through the same write-ahead log. If a write completes, all indexes reflect it. If power cuts mid-write, the WAL ensures all indexes roll back to a consistent state.
The practical consequence: adding a new index type doesn't require rearchitecting the write path. You implement the Index trait, plug it into the engine, and it participates in the WAL and flush cycle. This is how we added vector indexes, full-text search, and spatial indexes without breaking the existing code.
What v0.1.6 Taught Us About Allocators
The most surprising bug in v0.1.6 had nothing to do with our code. It was in glibc.
Under heavy concurrent close() calls, the arena allocator would crash because malloc wasn't thread-safe in the way the code was using it. The root cause: our concurrent test suite was spawning 12 threads, each closing its own DB instance simultaneously. The glibc arena allocator partitions heap memory by thread to reduce lock contention. When many threads all hit the allocator at once, they create competing arenas that can deadlock under specific allocation patterns.
The fix was simple: don't call close() from multiple threads simultaneously. We added a test that specifically exercises concurrent close and verified the behavior. The fix took five minutes. Finding it took a day.
This is the kind of thing that doesn't show up in single-threaded testing, doesn't appear in benchmarks, and only surfaces under the specific concurrent access pattern of a real workload. The only solution is adversarial testing: try to break your own system before your users do.
What's Real and What Isn't
moteDB is not a silver bullet. It's not going to outperform a tuned Postgres deployment for analytical queries. The vector index isn't going to beat Pinecone on recall. The LSM-tree isn't the right choice for a write-light, read-heavy workload.
What it does is run in 2MB on an embedded device, start in 3ms, handle sensor-rate writes without blocking, and answer cross-modal queries without a server. Those constraints shaped every decision, and the tradeoffs we made reflect them.
If your constraints are the same â embedded hardware, real-time write requirements, multimodal data, no server infrastructure â the architectural choices here are probably right for you too. If they're not, SQLite is probably fine.
The code is on GitHub and crates.io. The v0.2.0 release (which includes the two-layer index architecture and the columnar predicate pushdown) builds cleanly with zero clippy warnings and passes 749 concurrent test cases in about 3 minutes.
We'd rather you use the right tool for the job. If that's moteDB, great. If it isn't, at least now you know what the tradeoffs actually are.