PostgreSQL internals: Why is COUNT slow? -


why time complexity of count in postgresql (v9.6) appear bad get? following demonstrates without machine under resource pressure ~3x more records causes take ~9x long

i.e. o(n2)

   (397.0ms)   select count(*) "cases" => 850219     (3289.1ms)  select count(*) "cases" => 2577661 

i have expected take advantage of indexes on table guess estimate rather determine count.

i imagine it's hard improve i'd interested find out why , if it's tuneable. particularly tables written i'm surprised count can't cached. in db's internal structure makes hard include last_edited timestamp in table's metadata?

the reason it's nonlinear because of hint-bit writing on first touch of data, read queries can writes. or down caching effects.

usually counting of data without suitable index linear in cost. try repeated benchmark warm-up.

postgresql doesn't try keep accurate counts on tables because doesn't care much. keeps approximate statistics , figures that's enough, if want exact count you'll perform 1 @ time. count that's fast , slow more confusing.

it can't keep running total, because table has a different number of rows in different sessions @ same time. session 1 sees table 100 rows, session 2 has inserted has 205, , session 3 deleted has 50. session 6 started after session 5 committed delete, sees fewer rows session 1 @ same instant.

it can't keep "committed counter" + per-session counters, because rollbacks can occur in unpredictable orders, , what's "committed" varies depending on session's snapshot. it'd have mvcc on sort of side-table of table row counts, added row suitable xmin each commit, , pruned them transaction snapshots aged out. cost fair bit in performance, little benefit workloads.

above else, though, nobody's wanted enough add support it. either funding work, or doing coding directly. it'd big job little benefit.

what has been done adding support counting using index-only scans, greatly speed counts on primary key or other indexed columns. can use vacuum , visibility map skip doing heap re-checks visibility on pages aren't freshly written to.


Comments