Blog: Classier Twitter threads - Tag PostgreSQL

Analyzing AWS Costs with SQL

One of the things nobody loves about AWS is billing. Mainly because the more your application is cloud-native, the more unpredictable your bill will become. All that autoscaling and serverless saves you a lot of money, but if there is a massive spike in your app usage, there will also be a spike in your AWS bill. That’s usually a good thing because if you’ve done it right, the opposite is also true, and low traffic means lower costs.

Either way, the more you use AWS, the less useful the default monthly bill becomes. Maybe you have several environments (production, staging, dev, …), or perhaps you have multiple applications. The aggregated view simply becomes a problem, and you start to wonder how much each of those applications costs separately. Then you discover AWS cost allocation tags and cost explorer that can group by service or tags, which also lasts a while before you become unhappy again.

The problem is always the same - the data is not granular enough. Can we somehow get more granular data?

Continue reading ...

Database branching (just like with git) in PostgreSQL

Imagine you’re developing a feature, which requires you to do some database schema changes, and those changes might not be backward compatible. But right now, you just want to develop your feature and worry about migrating the production later. You start by writing the code and gradually ALTERing your local database.

But after some time, priorities change. Either the product owner changes his mind and forces you to start working on something else, or there is a critical bug on production that must be fixed immediately. Anyway, you have to switch to master and start working on something else. But you’ve made a mess of your local database because you’ve thought you’d be able to finish this before you move on to something else, and now your master won’t work with whatever is in your local database.

What now? If only you could just magically revert the database to where it was before you’ve started changing the schema.

Continue reading ...

Optimizing PostgreSQL queries with Multicolumn and Partial Indexes

I have an application that does asynchronous data processing, and at the core of the application are simulated queues in a PostgreSQL table. Each row in that queue represents a task and also contains the result of that task. You can imagine this table as a sort of multi-tenant where the rows belong to a data_source and queue. There are multiple DataSources, and each can have multiple queues. Some of the combinations contain very few rows, and some of them contain several million.

This uneven distribution of rows caused that while some of the queues can be queried rather quickly, the largest queue has slowly grown in size to the point where the job iterating over it took around 9 hours.

Continue reading ...