Blog: Classier Twitter threads - Tag PostgreSQL

New PostgreSQL instance configuration

You’ve spent weeks, maybe a few months, developing a prototype locally with a local PostgreSQL in docker, not worrying about a thing. But now the moment has come, and you must push to prod. You start an RDS instance, execute the migrations to set up the schema, and launch the app. It’s all good so far!

However, the team has grown from a single lonely wolf, and now you have new colleagues and maybe a few junior devs. It turns out that if everybody uses the same username and password, it’s very hard to find out who left that 5-hour query running that caused the instance to crawl so badly that a full outage would probably be better. So you give everyone their own user, and for some time, everything works great! Except on one late Friday night during a routine deployment, the migrations have failed! What do you mean by “cannot drop X because other objects depend on it”???

Disclaimer: This article does not aim to extensively describe all the best practices, just to set you on the right path and save you from a few ruined Fridays.

Continue reading ...

Analyzing AWS Costs with SQL

One of the things nobody loves about AWS is billing. Mainly because the more your application is cloud-native, the more unpredictable your bill will become. All that autoscaling and serverless saves you a lot of money, but if there is a massive spike in your app usage, there will also be a spike in your AWS bill. That’s usually a good thing because if you’ve done it right, the opposite is also true, and low traffic means lower costs.

Either way, the more you use AWS, the less useful the default monthly bill becomes. Maybe you have several environments (production, staging, dev, …), or perhaps you have multiple applications. The aggregated view simply becomes a problem, and you start to wonder how much each of those applications costs separately. Then you discover AWS cost allocation tags and cost explorer that can group by service or tags, which also lasts a while before you become unhappy again.

The problem is always the same - the data is not granular enough. Can we somehow get more granular data?

Continue reading ...

Database branching (just like with git) in PostgreSQL

Imagine you’re developing a feature, which requires you to do some database schema changes, and those changes might not be backward compatible. But right now, you just want to develop your feature and worry about migrating the production later. You start by writing the code and gradually ALTERing your local database.

But after some time, priorities change. Either the product owner changes his mind and forces you to start working on something else, or there is a critical bug on production that must be fixed immediately. Anyway, you have to switch to master and start working on something else. But you’ve made a mess of your local database because you’ve thought you’d be able to finish this before you move on to something else, and now your master won’t work with whatever is in your local database.

What now? If only you could just magically revert the database to where it was before you’ve started changing the schema.

Continue reading ...

Optimizing PostgreSQL queries with Multicolumn and Partial Indexes

I have an application that does asynchronous data processing, and at the core of the application are simulated queues in a PostgreSQL table. Each row in that queue represents a task and also contains the result of that task. You can imagine this table as a sort of multi-tenant where the rows belong to a data_source and queue. There are multiple DataSources, and each can have multiple queues. Some of the combinations contain very few rows, and some of them contain several million.

This uneven distribution of rows caused that while some of the queues can be queried rather quickly, the largest queue has slowly grown in size to the point where the job iterating over it took around 9 hours.

Continue reading ...