Picking a project

I’m the sort of person that has a hard time writing code for something that isn’t going to solve a problem. I actually spent a big chunk of my career writing very little code and always in tiny little chunks in perl.

These days I do a lot more programming and feel semi-comfortable in multiple languages. What I do not feel like is a proper programmer, in the sense that I still really struggle to turn go from initializing a new project to something that does anything at all, let alone useful. Despite that I have been kicking around in my mind that I’d like to take a spin at building a distributed key/value store.

Mind you I have no desire to try and write something other people would want to use, I want to write something good enough to demonstrate to myself and to the internet how it can be done and what some of the trade-offs involved are. This is also going to be a great chance for me to learn and demonstrate how to incrementally evolve a project and use testing with well defined and strongly typed interfaces to allow major refactors to be done without breaking builds.

I will fail to some extent at some point at all of the above, and I will do my best to massage it effectively for your enjoyment at my expense. Hopefully I will be able to find a genuinely useful nugget in these failures so I will make new mistakes in the next post.

My current intention is to use Thrift to define the public interface, with its associated datatypes and exceptions. I am familiar with Thrift already and it has code generation support for all of the languages I want to play with, meaning not only can I replace subsystems in a given implementation but I can try to draw wildly unscientific comparisons between them and get yelled at by helpful strangers in comments.

I will primarily be working in Rust because that’s what I want to learn in my spare time. I will start with a single node simple k/v blob store with pluggable storage and an initial hashmap storage engine. At this point it will be time to get some form of load generation going to allow me to generate the previously mentioned wildly unscientific benchmarks.

Hopefully at this point we’ll have a functionally correct but fairly slow implementation of Map<string, string> as a service. Ideally I’ll be able to use profiling and science to turn it into a reasonably fast and still functionally correct service. From there we can look at sharding, client topology discovery, better storage engines than the default hashmap (despite Rust having a pretty solid hashmap) with things like size bucketed arenas, and then look at a durable engine in the same vein as RocksDB and LevelDB.