I have an application that does asynchronous data processing, and at the core of the application are simulated queues in a PostgreSQL table. Each row in that queue represents a
task and also contains the result of that task. You can imagine this table as a sort of multi-tenant where the rows belong to a
queue. There are multiple
DataSources, and each can have multiple queues. Some of the combinations contain very few rows, and some of them contain several million.
This uneven distribution of rows caused that while some of the queues can be queried rather quickly, the largest queue has slowly grown in size to the point where the job iterating over it took around 9 hours.