Blog

My thoughts and experiments.

© 2023. Dmitry Dolgov All rights reserved.

Demand the impossible: rigorous database benchmarking

0. Table of content

1. Introduction

Everyone knows benchmarking is hard (and writing about benchmarking is double as hard), but have you ever asked “why”? What makes benchmarking, and performance evaluation in general, so error-prone, so complicated to get right and so easy to screw up? Those are not new questions, but there seem to be no definitive answer, and for databases things are even more grim – yet we could speculate, hoping that our speculation can help us learn something along the way. I don’t think you would read anything new below, in fact many things I’m going to talk about are rather obvious – but the process of bringing everything together and thinking about the topic is valuable by itself.

There could be at least few reasons why it’s so easy to fail trying to understand performance of a database system, and they usually have something to do with the inherent duality:

  • It’s necessary to combine expertise from both the domain specific area and general analytics expertise.
  • One have to take into account both known and unknown factors.
  • Establishing a comprehensive mental model of a database is surprisingly hard and could be counter-intuitive at times.

Running fast and slow: experiments with BPF programs performance

1. Introduction

My own personal white spot regarding BPF subsystem in Linux kernel was always programs performance and an overall introspection. Or to formulate it more specifically, I wasn’t sure if there is any difference in how we reason about an abstract program performance versus a BPF program? Could we use the same technics and approaches?

You may wonder why even bother when BPF programs are so small and fast? Generally speaking you would be right, but there are cases when BPF programs are not small any more and placed on the hot execution path, e.g. if we talk about a security system monitoring syscalls. In such situations even small overhead is drastically multiplied and accumulated, and it only makes sense to fully understand the system performance to avoid nasty surprises.

It seems many other people also would like to know more about this topic, thus want to share results of my investigation.

PSquare: practical quantiles

0. Motivation

Recently I’ve started to notice an interesting pattern. When you take something you thought was simple and look deep inside with a magnifying glass, it usually opens the whole new world of fascinating discoveries. It could be one of the principles of the universe, or just me overreacting on simple things. In any case one of such examples I wanted to share in this blog post, I hope it will bring the same joy to the readers as it did to me. Let’s talk about quantiles!

How many engineers does it take to make subscripting work?

Are you tired of this syntax in PostgreSQL?

SELECT jsonb_column->'key' FROM table;
UPDATE table SET jsonb_column =
            jsonb_set(jsonb_column, '{"key"}', '"value"');

The select part is actually fine. But for updates, especially for complex updates, it could be pretty verbose and far from being ergonomic. What would you say to this syntax instead?

SELECT jsonb_column['key'] FROM table;
UPDATE table SET jsonb_column['key'] = '"value"';