28 Nov 2020
0. How to read me?
I have to admit, my research blog posts are getting longer and longer. From one
side I find it genuinely encouraging, because if one gets so much information
just by scratching the topic, imagine what’s hidden beneath the surface! One
university professor once said “what could be interesting in databases?”, and
it turns out freaking a lot! On the other side it certainly poses problems for
potential readers. To overcome them I would suggest an interesting approach:
print this blog post out, or open it on your tablet/e-reader, where you can
make notes with a pencil or markers. Now while reading it try to spot ideas
particularly exciting for you and mark them. Along the way there would be
definitely some obscure parts or questions, write them on the sides as well.
You can experiment with the diagrams, changing or extending them, or just
drawing funny faces. But do not read everything at once, have no fear of
putting it aside for a while, and read in chunks that are convenient for you.
Some parts could be skipped as the text is build out of relatively independent
topics. The table of contents can help and guide you. Having said that we’re
ready to embark on the journey.
1. Introduction
Whenever we speak about indexes, especially in PostgreSQL context, there is a
lot to talk about: B-tree, Hash, GiST, SP-GiST, GIN, BRIN, RUM. But what if I
tell you that even the first item in this list alone hiding astonishing number
of interesting details and years of research? In this blog post I’ll try to
prove this statement, and we will be concerned mostly with B-tree as a data
structure.
07 Aug 2020
Do not be afraid, this short blog post is about databases and does not contain
any unreasonable metaphysical references. In fact, it’s a result of a journey
through couple of whitepapers and books with an unexpected intersection of two
rather different fields. I will try to describe everything step by step, so
that we can see if chess has anything in common with database query optimizer.
06 Dec 2019
0. How to read me?
Yes, I know, it’s a long text, and it was my conscious decision to write it in
this way. But fear not! Imagine that you read a book, take a look at the
introduction and first few interesting sections, think about it and then find
time to read further. I hope I’ve left enough references, so if you don’t get
some ideas you’ll be able to read more information about interesting parts. Or
you can even skip some sections, since they are relatively independent. This
table of contents will guide you:
1. Introduction
In mathematics it is too easy to concentrate very hard on one specific
problem. If your methods are not good enough to handle the problem, you will
eventually grind to a halt, baffled and defeated. Often the key to further
progress is to stand back, forget about the special problem, and see if you
can spot any general features of the surrounding area which might be of use.
Ian Stewart, Concepts of Modern Mathematics.
It’s not a secret that databases are damn complicated systems. And they tend to
run on top of even more complicated stacks of software. Nowadays you will not
surprise anyone (or at least not that much) by running your database on a
Kubernetes cluster or inside a virtual machine. It’s probably still
questionable whether it’s good and appropriate, but this approach is
something we have to face — sometimes it’s at least convenient, sometimes it
allows to be more resource efficient and sometimes it’s the only available
infrastructure in a company.
21 Dec 2017
As such, there’s really no “standard” benchmark that will inform you about
the best technology to use for your application. Only your requirements, your
data, and your infrastructure can tell you what you need to know.
For already some time I can’t stop doing interesting/useful/weird (one at the
time) benchmarks to reveal some details on how to apply document-oriented
approach in the world of relational databases. Finally, I decided that I have a
critical mass of those details to share in the form of blog post. So welcome to
The Benchmark Club, where we’re going to discuss what it takes to create a fair
performance comparison of different databases. As you may guess, the first rule
of The Benchmark Club is to never share a reproducible benchmarks. But we
identify ourselves as a badass engineers, so we’re going to break this rule
today.