Index-Only Scan in Postgresql is not always Index “Only”!


An Index-only scan is supposed to return query results just by accessing the index but in Postgresql, an index-only scan can end up accessing table rows (heap memory) as well, which might result in the query taking more time (or other resources) than anticipated. In this blog, I will discuss how we discovered this behavior of Postgresql and how we solved this for our use case.

The Problem:

We optimized a high IO-consuming read query some time back (detailed blog). The optimization we had done was to create appropriate indexes so that query can be resolved using an index-only scan so that there is no need to read table rows, thereby reducing IOPs (Input-Output per second) consumed by the query.

But a few weeks down the line we again started observing a gradual increase in IOPs consumed by the query. On checking the query plan, it was still using index-only scan but we found that the query was also doing a lot of disk access and it was accessing heap memory as well. It was not intuitive for us why would an index-only scan access heap memory. On further debugging and going through Postgresql docs we discovered that there are cases when index-only scan might end up accessing heap memory in Postgresql. Below image show the gradual increase in read IOPs that we faced:

The Reason:

This happened because of how concurrency control works in Postgresql, to quote from the official docs:

There is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be “visible” to the query’s MVCC snapshot. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway. And this is indeed the case, if the table row has been modified recently.
In short, the query engine needs to access the table row to determine whether the row is visible to the current transaction or not, but does that mean an index-only scan always has to access heap memory? No definitely not, further from the official docs:
For seldom-changing data there is a way around this problem. PostgreSQL tracks, for each page in a table’s heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table’s visibility map. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it’s set, the row is known visible and so the data can be returned with no further work. If it’s not set, the heap entry must be visited to find out whether it’s visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time.

To describe a visibility map briefly from the docs:

Each heap relation has a Visibility Map (VM) to keep track of which pages contain only tuples that are known to be visible to all active transactions.
So if the visibility map is up-to-date then the query engine need not access the table rows and the query can be resolved just by using the index and visibility map.

I will slightly digress here to talk a little bit about Vacuuming in Postgresql (it's needed to understand the remainder of this blog). In PostgreSQL, an UPDATE or DELETE of a row does not immediately remove the old version of the row, just a new version is created and stored. The old version keeps using the disk space, these old versions of rows are called dead tuples. The Vacuum process in Postgresql removes dead tuples and marks the space available for future reuse. Vacuum does other things as well but removing the dead tuples is one its most important tasks. Vacuum can be triggered manually and it can also be done automatically by setting some thresholds for dead tuples, whenever the number of dead tuples are more than the threshold, the vacuum is triggered automatically.

This much understanding of Vacuum in Postgresql is enough to understand the remainder of this blog but if you are curious you can check out more details about Vacuuming in the official doc.

Coming back to our problem, we now know that index-only scan uses visibility map to avoid accessing the heap memory, but how to keep the visibility map up-to-date? The Visibility Map is automatically updated when vacuum runs. If visibility map is automatically updated then why did we see an increase in IOPs in our case? The reason is that the table in consideration was huge (~200 million records) and auto-vacuum was not running frequently enough on the table, resulting in an outdated visibility map. In fact, it was taking close to 2 months between subsequent auto-vacuum triggers, because we were using the default auto-vacuum thresholds, which trigger auto-vacuum if at least 10% of the rows are dead tuples OR 20% of the rows are newly inserted since the last vacuum.

The Solution:

We changed our thresholds to run auto-vacuum frequently so that the visibility map remains updated. To achieve this, we updated the thresholds to a constant value of 100k dead rows OR 100k newly inserted rows. Now auto-vacuum triggers daily and IOPs consumed by the query have come down by 6 times!


When trying to optimize a query for an index-only scan in Postgresql we also need to check that the auto-vacuum on the table is running frequently enough to keep the visibility map updated because the visibility map is used to determine whether to read table rows or not during an index scan and if visibility map is up-to-date then in the majority of the cases query can be resolved just by reading from the index.


Postgresql Concurrency Control:
Visibility Map:
How index-only scan uses visibility map:,all%20the%20time.


Popular posts from this blog

Learnings from a 5-hour production downtime!