Blog2023-12-29T17:24:38+00:00http://erthalion.infoDmitry Dolgov9erthalion6@gmail.comDemand the impossible: rigorous database benchmarking2023-12-29T00:00:00+00:00http://erthalion.info/2023/12/29/statistics-and-benchmarkingEveryone knows benchmarking is hard (and writing about benchmarking is double as hard), but have you ever asked 'why'? There could be at least few reasons, and they usually have something to do with the inherent duality: it's necessary to combine expertise from both the domain specific area and general analytics expertise; one have to take into account both known and unknown factors. In this article we try to use this as a base to answer the following: how to not blow up your PostgreSQL benchmark?How many engineers does it take to make subscripting work?2021-03-03T09:12:45+00:00http://erthalion.info/2021/03/03/subscriptingRecently landed in PostgreSQL, jsonb subscripting support doesn't look as exciting as some other improvements around jsonb. But it's user visible changes are only tip of the iceberg. How many people were involved to make it, and what decisions choices were made? How long did it take, and what are the good/bad ideas to work on a patch?Evolution of tree data structures for indexing: more exciting than it sounds2020-11-28T21:12:45+00:00http://erthalion.info/2020/11/28/evolution-of-btree-index-amWhat is your first association with the concept of B-tree? Mine is 'old and well researched, or in other words boring'. And indeed apparently it was first introduced in 1970! Not only that, already in 1979 they were ubiquitous. Does it mean there is nothing exciting left any more?
It turns out that there are multitude of interesting ideas and techniques around B-Trees. They're all coming from desire to cover different (often incompatible) needs, as well as adapt to emerging hardware. In this blog post I'll try to show this, and we will be concerned mostly with B-tree as a data structure.PostgreSQL at low level: stay curious!2019-12-06T13:21:54+00:00http://erthalion.info/2019/12/06/postgresql-stay-curiousIt's not a secret that databases are damn complicated systems. And they tend to run on top of even more complicated stacks of software. Nowadays you will not surprise anyone (or at least not that much) by running your database on a Kubernetes cluster or inside a virtual machine. It's probably still questionable whether it's good and appropriate, but this approach is something we have to face — sometimes it's at least convenient, sometimes it allows to be more resource efficient and sometimes it's the only available infrastructure in a company. And one of the problems in this situation is that reasoning about the performance is not that easy any more. Well, it's not like it was much easier before, but still. Let's see what can we do about it and how strace, perf and BPF can change the game.Jsonb: few more stories about the performance2017-12-21T17:34:04+00:00http://erthalion.info/2017/12/21/advanced-json-benchmarks<blockquote> <p>As such, there’s really no “standard” benchmark that will inform you about the best technology to use for your application. Only your requirements, your data, and your infrastructure can tell you what you need to know.</p> </blockquote> <p>For already some time I can’t stop doing interesting/useful/weird (one at the time) benchmarks to reveal some details on how to apply document-oriented approach in the world of relational databases. Finally, I decided that I have a critical mass of those details to share in the form of blog post. So welcome to The Benchmark Club, where we’re going to discuss what it takes to create a fair performance comparison of different databases. As you may guess, the first rule of The Benchmark Club is to never share a reproducible benchmarks. But we identify ourselves as a badass engineers, so we’re going to break this rule today.</p> <p><img src="/public/img/fight_club.jpg" border="0" width="100%" style="margin: auto" /></p> <!--break--> <h2 id="targets">Targets</h2> <p>It’s not possible to compare all the existing solutions to store and process the data in form of documents (although looks like people usually expect exactly that), so I’ve limited my scope to PostgreSQL, MySQL and MongoDB:</p> <ul> <li> <p>PostgreSQL - just because it’s an...How to convert your data to jsonb?2016-06-05T00:16:21+00:00http://erthalion.info/2016/06/05/convert-into-jsonb<p>“How to start” is always a difficult question, and <code class="language-plaintext highlighter-rouge">jsonb</code> isn’t an exception. Here are few notes about converting different types of data into <code class="language-plaintext highlighter-rouge">jsonb</code>, that someone can find useful.</p> <p>Basically there are three possible cases of data conversion:</p> <ul> <li>Convert data from inside PostgreSQL</li> <li>Convert data from other database</li> <li>Convert plain data outside database</li> </ul> <!--break--> <h2 id="from-inside-postgresql">From inside PostgreSQL</h2> <p>First of all we shouldn’t forget we can build data in <code class="language-plaintext highlighter-rouge">jsonb</code> format manually:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="s1">'{"id": 1, "data": "aaa"}'</span><span class="p">::</span><span class="n">jsonb</span><span class="p">;</span> </code></pre></div></div> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> jsonb <span class="nt">--------------------------</span> <span class="o">{</span><span class="s2">"id"</span>: 1, <span class="s2">"data"</span>: <span class="s2">"aaa"</span><span class="o">}</span> </code></pre></div></div> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="n">jsonb_build_object</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">'data'</span><span class="p">,</span> <span class="s1">'aaa'</span><span class="p">);</span> </code></pre></div></div> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> jsonb_build_object <span class="nt">--------------------------</span> <span class="o">{</span><span class="s2">"id"</span>: 1, <span class="s2">"data"</span>: <span class="s2">"aaa"</span><span class="o">}</span> </code></pre></div></div> <p>If we already have some relational data we can easy perform one-to-one conversion for both complex and simple data types:</p> <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">select</span> <span class="n">to_jsonb</span><span class="p">(</span><span class="nb">timestamp</span> <span class="s1">'2016-06-05'</span><span class="p">);</span> </code></pre></div></div> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code> to_jsonb <span class="nt">-----------------------</span> <span class="s2">"2016-06-05T00:00:00"</span> </code></pre></div></div> <div class="language-sql...Compare incomparable: PostgreSQL vs Mysql vs Mongodb2015-12-29T01:14:21+00:00http://erthalion.info/2015/12/29/json-benchmarks<blockquote> <p>As such, there’s really no “standard” benchmark that will inform you about the best technology to use for your application. Only your requirements, your data, and your infrastructure can tell you what you need to know.</p> </blockquote> <p>NoSql is everywhere and we can’t escape from it (although I can’t say we want to escape). Let’s leave the question about reasons outside this text, and just note one thing - this trend isn’t related only to new or existing NoSql solutions. It has another side, namely the schema-less data support in traditional relational databases. It’s amazing how many possibilities hiding at the edge of the relational model and everything else. But of course there is a balance that you should find for your specific data. It can’t be easy, first of all because it’s required to compare incomparable things, e.g. performance of a NoSql solution and traditional database. Here in this post I’ll make such attempt and show the comparison of jsonb in PostgreSQL, json in Mysql and bson in Mongodb.</p> <!--break--> <h2 id="what-the-hell-is-going-on-here">What the hell is going on here?</h2> <p>Breaking news:</p> <ul> <li><a href="http://www.postgresql.org/docs/9.4/static/datatype-json.html">PostgreSQL 9.4</a> - a new data type <code class="language-plaintext highlighter-rouge">jsonb</code> with slightly extended support in the...