Problem Overview#
Last night, the business team updated a SQL query. Previously, the query ran very fast without the DATE_CREATED field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.
Before:
update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'After:
update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'
AND DATE_CREATED > now() - interval '31' day
AND DATE_CREATED < now()Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.
Problem Analysis#
The Execution Plan Appeared Correct#
Table structure:
## \d+ TABLE_RECORD
Partitioned table "public.TABLE_RECORD"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------------------------+-----------------------------+-----------+----------+---------------------------------------------------+----------+--------------+--------------------------
id_TABLE_RECORD | character varying(32) | | not null | nextval('seq_TABLE_RECORD'::regclass) | extended | |
appl_no | character varying(100) | | | | extended | |
r_appl_no | character varying(100) | | | | extended | |
...
created_by | character varying(100) | | not null | 'sys'::character varying | extended | |
date_created | timestamp without time zone | | not null | now() | plain | |
updated_by | character varying(100) | | not null | 'sys'::character varying | extended | |
date_updated | timestamp without time zone | | not null | now() | plain | |
Partition key: RANGE (date_created)
Indexes:
"date_TABLE_RECORD" btree (date_created)
"idx_dateupdated" btree (date_updated)
"idx_applnodeleted" btree (appl_no, is_deleted)
"nk_TABLE_RECORD" btree (appl_no)
Partitions: TABLE_RECORD_202211 FOR VALUES FROM ('2022-11-01 00:00:00') TO ('2022-12-01 00:00:00'),
...
TABLE_RECORD_202303 FOR VALUES FROM ('2023-03-01 00:00:00') TO ('2023-04-01 00:00:00'),
TABLE_RECORD_202304 FOR VALUES FROM ('2023-04-01 00:00:00') TO ('2023-05-01 00:00:00'),
TABLE_RECORD_202305 FOR VALUES FROM ('2023-05-01 00:00:00') TO ('2023-06-01 00:00:00'),
TABLE_RECORD_202306 FOR VALUES FROM ('2023-06-01 00:00:00') TO ('2023-07-01 00:00:00'),
...
TABLE_RECORD_202512 FOR VALUES FROM ('2025-12-01 00:00:00') TO ('2026-01-01 00:00:00'),
TABLE_RECORD_other DEFAULTThis SQL would access partitions from the last 2 months, both of which contained data. The above UPDATE would only update one row.
At first, analyzing the problem was very confusing because when we ran EXPLAIN, the execution plan looked fine.
EXPLAIN partition scan info:
-> Index Scan using TABLE_RECORD_202302_date_created_idx on TABLE_RECORD_202302 TABLE_RECORD_4 (cost=0.44..5.47 rows=1 width=485)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5 (cost=0.44..5.47 rows=1 width=482)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.43..30.49 rows=1 width=483)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.56..45.11 rows=18 width=485)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10 (cost=0.14..5.17 rows=1 width=3502)Partition data distribution:
select count(*),tableoid::regclass from TABLE_RECORD group by 2;
count | tableoid
-------+---------------------------------
56558 | TABLE_RECORD_202303
4436 | TABLE_RECORD_202211
6929 | TABLE_RECORD_202306
945 | TABLE_RECORD_202305
1413 | TABLE_RECORD_202304
5499 | TABLE_RECORD_202212
1486 | TABLE_RECORD_202301
4722 | TABLE_RECORD_202302The execution plan appeared to access different indexes for different partitions:
date_TABLE_RECORD: index on the partition keyidx_applnodeleted: composite index onappl_no, is_deleted
In reality, the SQL could prune partitions using the DATE_CREATED (last 31 days) field. But if it used the index on that field, there would be no selectivity at all. The composite index idx_applnodeleted on appl_no, is_deleted had much better selectivity within partitions, so the correct execution plan should choose the idx_applnodeleted composite index.
The EXPLAIN plan above is not the actual execution plan, but we can see that the May and June partitions did use the correct index — the appl_no, is_deleted composite index.
To view the actual execution plan, we need to execute the SQL. So we changed the UPDATE to a SELECT:
# explain (analyze,buffers,timing,verbose) select count(*) from TABLE_RECORD WHERE APPL_NO = 'LZLMATH20230132302302' AND IS_DELETED = '0' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now() ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=266.09..266.10 rows=1 width=8) (actual time=0.565..0.566 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=48
-> Append (cost=0.14..265.95 rows=56 width=0) (actual time=0.388..0.558 rows=1 loops=1)
Buffers: shared hit=48
Subplans Removed: 37
-> Index Scan using idx_applnodeleted_25 on public.TABLE_RECORD_202305 TABLE_RECORD_1 (cost=0.43..30.39 rows=2 width=0) (actual time=0.059..0.059 rows=0 loops=1)
Index Cond: (((TABLE_RECORD_1.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_1.is_deleted)::text = '0'::text))
Filter: ((TABLE_RECORD_1.date_created < now()) AND (TABLE_RECORD_1.date_created > (now() - '31 days'::interval day)))
Buffers: shared hit=3
-> Index Scan using idx_applnodeleted_14 on public.TABLE_RECORD_202306 TABLE_RECORD_2 (cost=0.56..42.52 rows=17 width=0) (actual time=0.328..0.498 rows=1 loops=1)
Index Cond: (((TABLE_RECORD_2.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_2.is_deleted)::text = '0'::text))
Filter: ((TABLE_RECORD_2.date_created < now()) AND (TABLE_RECORD_2.date_created > (now() - '31 days'::interval day)))
Buffers: shared hit=45
Planning:
Buffers: shared hit=5867
Planning Time: 17.195 ms
Execution Time: 0.654 ms
(18 rows)The SELECT only accessed the May and June partitions, indicating partition pruning worked correctly. Both partitions used the idx_applnodeleted index, so index selection was also correct.
Direct execution of the SELECT statement returned results in milliseconds:
## select count(*) from TABLE_RECORD WHERE APPL_NO = 'LZLMATH20230132302302' AND IS_DELETED = '0' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now() ;
count
-------
1
(1 row)
Time: 4.946 msAt this point in the analysis, the execution plan appeared normal and execution time appeared normal.
The Business SQL Was Still Slow#
However, slow SQL still appeared in the PostgreSQL logs — the UPDATE took 10 seconds:
2023-06-29 11:06:45.077 CST,"lzldbopr","lzldb",116286,"30.88.78.90:51871",649cdebf.1c63e,7,"UPDATE",2023-06-29 09:30:39 CST,759/12440291,4002354803,LOG,00000,"duration: 10287.105 ms " plan:
Query Text: update TABLE_RECORD
set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
WHERE APPL_NO = $1
AND IS_DELETED = '0'
AND DATE_CREATED > now() - interval '31' day
AND DATE_CREATED < now()
Update on TABLE_RECORD (cost=0.14..203.79 rows=39 width=2960)
Update on TABLE_RECORD_202211 TABLE_RECORD_1
...
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202305_date_created_idx on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.44..5.47 rows=1 width=483)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202306_date_created_idx on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.44..5.47 rows=1 width=485)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
...The May and June partitions were still using the date_created index on the partition key. The execution plan estimated only 1 row, but in reality these two partitions each had millions of rows.
This was very confusing — the optimizer itself could choose a better index, and EXPLAIN showed it going to that index, but the business SQL simply wasn’t using the correct index.
Updating Statistics#
Since this was a PostgreSQL execution plan issue, the first thought was to collect statistics.
After the problem occurred, we collected statistics for both the parent partitioned table and child partitions. Concerned that sessions might have cached the execution plan (plan_cache_mode=auto), we killed all sessions that connected before the statistics collection.
The logs still showed the SQL taking 10 seconds, indicating it wasn’t a statistics issue.
At this point the problem remained unsolved. We seemed to have exhausted all options.
Root Cause#
Earlier, when analyzing execution plans, the DBA’s EXPLAIN output differed from the application’s execution plan. However, we had been executing everything as the PostgreSQL superuser. We switched to the application user and ran EXPLAIN again — the execution plan matched what was in the logs!
Since we had previously encountered issues with native partitioned table permissions causing abnormal execution plans, we immediately checked partition permissions.
Parent table permissions:
## \dp+ TABLE_RECORD
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+--------------------------+-------------------+-------------------------------------+-------------------+----------
public | TABLE_RECORD | partitioned table | lzldbdata=arwdDxt/lzldbdata +| |
| | | r_lzldbdata_qry=r/lzldbdata +| |
| | | r_lzldbdata_dml=arwd/lzldbdata +| |
(1 row)Child partition permissions:
## \dp+ TABLE_RECORD_202505
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+---------------------------------+-------+------------------------------------+-------------------+----------
public | TABLE_RECORD_202505 | table | lzldbdata=arwdDxt/lzldbdata +| | The partition permissions were missing the r_lzldbdata_dml role, which is granted to the business user.
We immediately granted the permissions, and the problem was resolved:
grant select,update,delete,insert on TABLE_RECORD_202305 to r_lzldbdata_dml;
grant select,update,delete,insert on TABLE_RECORD_202306 to r_lzldbdata_dml;After switching to the opr user again and running EXPLAIN, the execution plan was correct — the May and June partitions used the proper index:
\c - lzldbopr
-> Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5 (cost=0.44..5.47 rows=1 width=482)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6 (cost=0.44..5.47 rows=1 width=481)
Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
-> Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7 (cost=0.43..30.39 rows=1 width=483)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8 (cost=0.56..42.57 rows=17 width=485)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10 (cost=0.14..5.17 rows=1 width=3502)
Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))No more slow UPDATE statements were observed in the PostgreSQL logs.
Testing (Not Reproduced)#
Initial table creation script:
-- Switch to non-superuser
\c - lzldbdata
-- create table
CREATE TABLE PUBLIC.LZLPARTITION
(
APPL_NO varchar(100) NULL,
IS_DELETED varchar(8) NULL,
DATE_CREATED timestamp NOT NULL DEFAULT now(),
DATE_UPDATED timestamp NOT NULL DEFAULT now()
) PARTITION BY RANGE(DATE_CREATED);
-- indexes
create index DATE_LZLPARTITION on PUBLIC.LZLPARTITION (DATE_CREATED);
create index NK_LZLPARTITION on PUBLIC.LZLPARTITION (APPL_NO);
-- privs
GRANT SELECT ON TABLE public.LZLPARTITION TO r_lzldbdata_qry;
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE public.LZLPARTITION TO r_lzldbdata_dml;
-- partition
create table LZLPARTITION_202301 partition of LZLPARTITION for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00');
create table LZLPARTITION_202302 partition of LZLPARTITION for values from ('2023-02-01 00:00:00') to ('2023-03-01 00:00:00');
create table LZLPARTITION_202303 partition of LZLPARTITION for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00');
create table LZLPARTITION_202304 partition of LZLPARTITION for values from ('2023-04-01 00:00:00') to ('2023-05-01 00:00:00');
create table LZLPARTITION_202305 partition of LZLPARTITION for values from ('2023-05-01 00:00:00') to ('2023-06-01 00:00:00');
create table LZLPARTITION_202306 partition of LZLPARTITION for values from ('2023-06-01 00:00:00') to ('2023-07-01 00:00:00');Generate data:
insert
into
public.LZLPARTITION
select
n + 10,
'N',
to_char(to_date('2023-01-01', 'YYYY-MM-DD') + ('' || n || ' minute') ::interval, 'YYYY-MM-DD')::"date",
now()
from
generate_series(0, 300000) nData distribution:
select count(*),tableoid::regclass from lzlpartition group by 2;
count | tableoid
-------+---------------------
44640 | lzlpartition_202301
40320 | lzlpartition_202302
44640 | lzlpartition_202303
43200 | lzlpartition_202304
44640 | lzlpartition_202305
43200 | lzlpartition_202306
39361 | lzlpartition_202307Permissions not inherited:
## \dp+ lzlpartition
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+--------------+-------------------+-------------------------------------+-------------------+----------
public | lzlpartition | partitioned table | lzldbdata=arwdDxt/lzldbdata +| |
| | | r_lzldbdata_qry=r/lzldbdata +| |
| | | r_lzldbdata_dml=arwd/lzldbdata | |
## \dp+ lzlpartition_202306
Access privileges
Schema | Name | Type | Access privileges | Column privileges | Policies
--------+---------------------+-------+-------------------+-------------------+----------
public | lzlpartition_202306 | table | | | Execution plan (correct):
explain select count(*) from lzlpartition WHERE APPL_NO = '217450' AND IS_DELETED = 'N' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now();
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=36.76..36.77 rows=1 width=8)
-> Append (cost=0.29..36.74 rows=7 width=0)
Subplans Removed: 5
-> Index Scan using lzlpartition_202305_appl_no_idx on lzlpartition_202305 lzlpartition_1 (cost=0.15..5.19 rows=1 width=0)
Index Cond: ((appl_no)::text = '217450'::text)
Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
-> Index Scan using lzlpartition_202306_appl_no_idx on lzlpartition_202306 lzlpartition_2 (cost=0.15..5.19 rows=1 width=0)
Index Cond: ((appl_no)::text = '217450'::text)
Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))The permissions were still not inherited. In fact, we tested on other PostgreSQL versions and observed the same behavior — it seems to be a general behavior.
However, even so, we couldn’t reproduce the issue. The test results used the correct index, unlike the production environment which used the wrong index.
Summary#
Since we had collected statistics and killed sessions, it shouldn’t have been a cached execution plan issue. After executing GRANT, the partition execution plan immediately became correct (even granting just one partition fixed that specific partition), so we are fairly confident that the partition permission issue caused the abnormal partition execution plan.
The analysis and resolution process can be summarized as follows:
- Switch to the application user to view the execution plan. Using the superuser to view execution plans is a common practice, but the plan seen from the superuser may not be correct.
- Permissions on child partitions of partitioned tables. The root cause is that permissions on child partitions of PostgreSQL partitioned tables were inconsistent with the parent table, causing the execution plan to be abnormal. In other words, permission issues affected PostgreSQL’s execution plan.
- This issue is difficult to reproduce and occurs very, very rarely.
- Permission-caused execution plan anomalies are extremely subtle and hard to diagnose.
Two questions worth deeper discussion:
- Permission issues shouldn’t affect execution plans. Why do permissions affect execution plans?
- Child partition permissions are inconsistent with parent table permissions. Why don’t child partitions fully inherit parent table permissions?
A bug report has been submitted to see what the official team says.