BonsaiDb Performance Update: A deep-dive on file synchronization

Unfortunately, v0.5.3 of Nebari introduced a slowdown of write operations, and this post gives an overview of what happened. I also do a deep-dive on my investigation into file syncing.

1 Like

On Linux, it’s fsync(), fdatasync(), and sync_file_range().

Indeed with most datacenter-grade storage (like SSD with power loss protection), it is safe to use O_DIRECT without any sync. I suppose it is safe in a virtual machine as well.
E.g. MySQL has the O_DIRECT_NO_FSYNC option.

1 Like

Direct file IO is something I didn’t really cover, mostly because in my testing for Nebari, it doesn’t help due to the lack of alignment on most data written. Some of my earlier testing of the impact of alignment was likely flawed by testing against tmpfs, so it could be that some of the alignment that CouchStore originally had could have yielded a little better performance in some cases with direct IO.

A follow-up blog post by the founder of RavenDb talks about their experience of direct IO also offering a big benefit.

With the new format I’m exploring, direct file IO should be more impactful, and I’m hoping io_uring will also have a bigger impact than it did with my earlier (flawed) experiments.

Ultimately, I’m going to pick safe defaults for BonsaiDb. I do want to explore what options I can provide to users that are still safe in some situations that can make a big impact on hardware built for reliability.

Thank you for sharing. I hadn’t looked at MySQL closely – I’ll have to add that to my list of databases to read up on.