Skip to main content
  1. Posts/

Incorrect Execution Plan Caused by Partition Permission Issues

·2404 words·12 mins
liuzhilong62
Author
liuzhilong62
PostgreSQL DBA. Writing about database internals, production cases, and source code analysis.

Problem Overview
#

Last night, the business team updated a SQL query. Previously, the query ran very fast without the DATE_CREATED field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.

Before:

 update  TABLE_RECORD
                        set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
                        WHERE APPL_NO = $1
                        AND IS_DELETED = '0'

After:

 update  TABLE_RECORD
                        set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
                        WHERE APPL_NO = $1
                        AND IS_DELETED = '0'
                        AND DATE_CREATED > now() - interval '31' day
                        AND DATE_CREATED < now()

Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.

Problem Analysis
#

The Execution Plan Appeared Correct
#

Table structure:

## \d+ TABLE_RECORD
                                                                     Partitioned table "public.TABLE_RECORD"
             Column             |            Type             | Collation | Nullable |                      Default                      | Storage  | Stats target |       Description        
--------------------------------+-----------------------------+-----------+----------+---------------------------------------------------+----------+--------------+--------------------------
 id_TABLE_RECORD    | character varying(32)       |           | not null | nextval('seq_TABLE_RECORD'::regclass) | extended |              | 
 appl_no                        | character varying(100)      |           |          |                                                   | extended |              | 
 r_appl_no                 | character varying(100)      |           |          |                                                   | extended |              | 
...
 created_by                     | character varying(100)      |           | not null | 'sys'::character varying                          | extended |              | 
 date_created                   | timestamp without time zone |           | not null | now()                                             | plain    |              | 
 updated_by                     | character varying(100)      |           | not null | 'sys'::character varying                          | extended |              | 
 date_updated                   | timestamp without time zone |           | not null | now()                                             | plain    |              | 
Partition key: RANGE (date_created)
Indexes:
    "date_TABLE_RECORD" btree (date_created)
    "idx_dateupdated" btree (date_updated)
    "idx_applnodeleted" btree (appl_no, is_deleted)
    "nk_TABLE_RECORD" btree (appl_no)
Partitions: TABLE_RECORD_202211 FOR VALUES FROM ('2022-11-01 00:00:00') TO ('2022-12-01 00:00:00'),
...
            TABLE_RECORD_202303 FOR VALUES FROM ('2023-03-01 00:00:00') TO ('2023-04-01 00:00:00'),
            TABLE_RECORD_202304 FOR VALUES FROM ('2023-04-01 00:00:00') TO ('2023-05-01 00:00:00'),
            TABLE_RECORD_202305 FOR VALUES FROM ('2023-05-01 00:00:00') TO ('2023-06-01 00:00:00'),
            TABLE_RECORD_202306 FOR VALUES FROM ('2023-06-01 00:00:00') TO ('2023-07-01 00:00:00'),
 ...
            TABLE_RECORD_202512 FOR VALUES FROM ('2025-12-01 00:00:00') TO ('2026-01-01 00:00:00'),
            TABLE_RECORD_other DEFAULT

This SQL would access partitions from the last 2 months, both of which contained data. The above UPDATE would only update one row.

At first, analyzing the problem was very confusing because when we ran EXPLAIN, the execution plan looked fine.

EXPLAIN partition scan info:

   ->  Index Scan using TABLE_RECORD_202302_date_created_idx on TABLE_RECORD_202302 TABLE_RECORD_4  (cost=0.44..5.47 rows=1 width=485)
         Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
         Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
   ->  Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5  (cost=0.44..5.47 rows=1 width=482)
         Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
         Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
   ->  Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6  (cost=0.44..5.47 rows=1 width=481)
         Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
         Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
   ->  Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7  (cost=0.43..30.49 rows=1 width=483)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8  (cost=0.56..45.11 rows=18 width=485)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9  (cost=0.14..5.17 rows=1 width=3502)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10  (cost=0.14..5.17 rows=1 width=3502)

Partition data distribution:

 select count(*),tableoid::regclass from TABLE_RECORD group by 2;
 count |            tableoid             
-------+---------------------------------
 56558 | TABLE_RECORD_202303
  4436 | TABLE_RECORD_202211
  6929 | TABLE_RECORD_202306
   945 | TABLE_RECORD_202305
  1413 | TABLE_RECORD_202304
  5499 | TABLE_RECORD_202212
  1486 | TABLE_RECORD_202301
  4722 | TABLE_RECORD_202302

The execution plan appeared to access different indexes for different partitions:

  • date_TABLE_RECORD: index on the partition key
  • idx_applnodeleted: composite index on appl_no, is_deleted

In reality, the SQL could prune partitions using the DATE_CREATED (last 31 days) field. But if it used the index on that field, there would be no selectivity at all. The composite index idx_applnodeleted on appl_no, is_deleted had much better selectivity within partitions, so the correct execution plan should choose the idx_applnodeleted composite index.

The EXPLAIN plan above is not the actual execution plan, but we can see that the May and June partitions did use the correct index — the appl_no, is_deleted composite index.

To view the actual execution plan, we need to execute the SQL. So we changed the UPDATE to a SELECT:

 # explain (analyze,buffers,timing,verbose) select count(*) from   TABLE_RECORD                        WHERE APPL_NO = 'LZLMATH20230132302302'                  AND IS_DELETED = '0'          AND DATE_CREATED > now() - interval '31' day                  AND DATE_CREATED < now()        ;
                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=266.09..266.10 rows=1 width=8) (actual time=0.565..0.566 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=48
   ->  Append  (cost=0.14..265.95 rows=56 width=0) (actual time=0.388..0.558 rows=1 loops=1)
         Buffers: shared hit=48
         Subplans Removed: 37
         ->  Index Scan using idx_applnodeleted_25 on public.TABLE_RECORD_202305 TABLE_RECORD_1  (cost=0.43..30.39 rows=2 width=0) (actual time=0.059..0.059 rows=0 loops=1)
               Index Cond: (((TABLE_RECORD_1.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_1.is_deleted)::text = '0'::text))
               Filter: ((TABLE_RECORD_1.date_created < now()) AND (TABLE_RECORD_1.date_created > (now() - '31 days'::interval day)))
               Buffers: shared hit=3
         ->  Index Scan using idx_applnodeleted_14 on public.TABLE_RECORD_202306 TABLE_RECORD_2  (cost=0.56..42.52 rows=17 width=0) (actual time=0.328..0.498 rows=1 loops=1)
               Index Cond: (((TABLE_RECORD_2.appl_no)::text = 'LZLMATH20230132302302'::text) AND ((TABLE_RECORD_2.is_deleted)::text = '0'::text))
               Filter: ((TABLE_RECORD_2.date_created < now()) AND (TABLE_RECORD_2.date_created > (now() - '31 days'::interval day)))
               Buffers: shared hit=45
 Planning:
   Buffers: shared hit=5867
 Planning Time: 17.195 ms
 Execution Time: 0.654 ms
(18 rows)

The SELECT only accessed the May and June partitions, indicating partition pruning worked correctly. Both partitions used the idx_applnodeleted index, so index selection was also correct.

Direct execution of the SELECT statement returned results in milliseconds:

## select count(*) from   TABLE_RECORD                        WHERE APPL_NO = 'LZLMATH20230132302302'                  AND IS_DELETED = '0'          AND DATE_CREATED > now() - interval '31' day                  AND DATE_CREATED < now()        ;
 count 
-------
     1
(1 row)

Time: 4.946 ms

At this point in the analysis, the execution plan appeared normal and execution time appeared normal.

The Business SQL Was Still Slow
#

However, slow SQL still appeared in the PostgreSQL logs — the UPDATE took 10 seconds:

2023-06-29 11:06:45.077 CST,"lzldbopr","lzldb",116286,"30.88.78.90:51871",649cdebf.1c63e,7,"UPDATE",2023-06-29 09:30:39 CST,759/12440291,4002354803,LOG,00000,"duration: 10287.105 ms " plan:
Query Text: update  TABLE_RECORD
                        set IS_DELETED = '1', DATE_UPDATED = LOCALTIMESTAMP(0)
                        WHERE APPL_NO = $1
                        AND IS_DELETED = '0'
                        AND DATE_CREATED > now() - interval '31' day
                        AND DATE_CREATED < now()
Update on TABLE_RECORD  (cost=0.14..203.79 rows=39 width=2960)
  Update on TABLE_RECORD_202211 TABLE_RECORD_1
...
  ->  Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6  (cost=0.44..5.47 rows=1 width=481)
        Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
        Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
  ->  Index Scan using TABLE_RECORD_202305_date_created_idx on TABLE_RECORD_202305 TABLE_RECORD_7  (cost=0.44..5.47 rows=1 width=483)
        Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
        Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
  ->  Index Scan using TABLE_RECORD_202306_date_created_idx on TABLE_RECORD_202306 TABLE_RECORD_8  (cost=0.44..5.47 rows=1 width=485)
        Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
        Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
  ->  Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9  (cost=0.14..5.17 rows=1 width=3502)
        Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
        Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
...

The May and June partitions were still using the date_created index on the partition key. The execution plan estimated only 1 row, but in reality these two partitions each had millions of rows.

This was very confusing — the optimizer itself could choose a better index, and EXPLAIN showed it going to that index, but the business SQL simply wasn’t using the correct index.

Updating Statistics
#

Since this was a PostgreSQL execution plan issue, the first thought was to collect statistics.

After the problem occurred, we collected statistics for both the parent partitioned table and child partitions. Concerned that sessions might have cached the execution plan (plan_cache_mode=auto), we killed all sessions that connected before the statistics collection.

The logs still showed the SQL taking 10 seconds, indicating it wasn’t a statistics issue.

At this point the problem remained unsolved. We seemed to have exhausted all options.

Root Cause
#

Earlier, when analyzing execution plans, the DBA’s EXPLAIN output differed from the application’s execution plan. However, we had been executing everything as the PostgreSQL superuser. We switched to the application user and ran EXPLAIN again — the execution plan matched what was in the logs!

Since we had previously encountered issues with native partitioned table permissions causing abnormal execution plans, we immediately checked partition permissions.

Parent table permissions:

## \dp+ TABLE_RECORD
                                                     Access privileges
 Schema |           Name           |       Type        |          Access privileges          | Column privileges | Policies 
--------+--------------------------+-------------------+-------------------------------------+-------------------+----------
 public | TABLE_RECORD | partitioned table | lzldbdata=arwdDxt/lzldbdata    +|                   | 
        |                          |                   | r_lzldbdata_qry=r/lzldbdata    +|                   | 
        |                          |                   | r_lzldbdata_dml=arwd/lzldbdata +|                   | 
(1 row)

Child partition permissions:

## \dp+ TABLE_RECORD_202505
                                                  Access privileges
 Schema |              Name               | Type  |         Access privileges          | Column privileges | Policies 
--------+---------------------------------+-------+------------------------------------+-------------------+----------
 public | TABLE_RECORD_202505 | table | lzldbdata=arwdDxt/lzldbdata   +|                   | 

The partition permissions were missing the r_lzldbdata_dml role, which is granted to the business user.

We immediately granted the permissions, and the problem was resolved:

grant select,update,delete,insert  on TABLE_RECORD_202305 to r_lzldbdata_dml;
grant select,update,delete,insert  on TABLE_RECORD_202306 to r_lzldbdata_dml;

After switching to the opr user again and running EXPLAIN, the execution plan was correct — the May and June partitions used the proper index:

\c - lzldbopr

   ->  Index Scan using TABLE_RECORD_202303_date_created_idx on TABLE_RECORD_202303 TABLE_RECORD_5  (cost=0.44..5.47 rows=1 width=482)
         Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
         Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
   ->  Index Scan using TABLE_RECORD_202304_date_created_idx on TABLE_RECORD_202304 TABLE_RECORD_6  (cost=0.44..5.47 rows=1 width=481)
         Index Cond: ((date_created > (now() - '31 days'::interval day)) AND (date_created < now()))
         Filter: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
   ->  Index Scan using idx_applnodeleted_25 on TABLE_RECORD_202305 TABLE_RECORD_7  (cost=0.43..30.39 rows=1 width=483)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_14 on TABLE_RECORD_202306 TABLE_RECORD_8  (cost=0.56..42.57 rows=17 width=485)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_38 on TABLE_RECORD_202307 TABLE_RECORD_9  (cost=0.14..5.17 rows=1 width=3502)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
   ->  Index Scan using idx_applnodeleted_1 on TABLE_RECORD_202308 TABLE_RECORD_10  (cost=0.14..5.17 rows=1 width=3502)
         Index Cond: (((appl_no)::text = 'LZLMATH20230132302302'::text) AND ((is_deleted)::text = '0'::text))
         Filter: ((date_created < now()) AND (date_created > (now() - '31 days'::interval day)))

No more slow UPDATE statements were observed in the PostgreSQL logs.

Testing (Not Reproduced)
#

Initial table creation script:

-- Switch to non-superuser
\c - lzldbdata
-- create table
CREATE TABLE PUBLIC.LZLPARTITION
(
    APPL_NO varchar(100)  NULL,
	IS_DELETED varchar(8) NULL, 
    DATE_CREATED timestamp NOT NULL DEFAULT now(),
    DATE_UPDATED timestamp NOT NULL DEFAULT now()
) PARTITION BY RANGE(DATE_CREATED);

-- indexes
create index DATE_LZLPARTITION on PUBLIC.LZLPARTITION (DATE_CREATED);
create index NK_LZLPARTITION on PUBLIC.LZLPARTITION (APPL_NO);


-- privs
GRANT SELECT ON TABLE public.LZLPARTITION TO r_lzldbdata_qry;
GRANT SELECT,INSERT,UPDATE,DELETE ON TABLE public.LZLPARTITION TO r_lzldbdata_dml;

-- partition
create table LZLPARTITION_202301 partition of LZLPARTITION for values from ('2023-01-01 00:00:00') to ('2023-02-01 00:00:00');
create table LZLPARTITION_202302 partition of LZLPARTITION for values from ('2023-02-01 00:00:00') to ('2023-03-01 00:00:00');
create table LZLPARTITION_202303 partition of LZLPARTITION for values from ('2023-03-01 00:00:00') to ('2023-04-01 00:00:00');
create table LZLPARTITION_202304 partition of LZLPARTITION for values from ('2023-04-01 00:00:00') to ('2023-05-01 00:00:00');
create table LZLPARTITION_202305 partition of LZLPARTITION for values from ('2023-05-01 00:00:00') to ('2023-06-01 00:00:00');
create table LZLPARTITION_202306 partition of LZLPARTITION for values from ('2023-06-01 00:00:00') to ('2023-07-01 00:00:00');

Generate data:

insert
      into
      public.LZLPARTITION
select
      n + 10,
      'N',
      to_char(to_date('2023-01-01', 'YYYY-MM-DD') + ('' || n || ' minute')  ::interval, 'YYYY-MM-DD')::"date",
      now()
from
      generate_series(0, 300000) n

Data distribution:

select count(*),tableoid::regclass from lzlpartition group by 2;
 count |      tableoid       
-------+---------------------
 44640 | lzlpartition_202301
 40320 | lzlpartition_202302
 44640 | lzlpartition_202303
 43200 | lzlpartition_202304
 44640 | lzlpartition_202305
 43200 | lzlpartition_202306
 39361 | lzlpartition_202307

Permissions not inherited:

## \dp+ lzlpartition
                                               Access privileges
 Schema |     Name     |       Type        |          Access privileges          | Column privileges | Policies 
--------+--------------+-------------------+-------------------------------------+-------------------+----------
 public | lzlpartition | partitioned table | lzldbdata=arwdDxt/lzldbdata    +|                   | 
        |              |                   | r_lzldbdata_qry=r/lzldbdata    +|                   | 
        |              |                   | r_lzldbdata_dml=arwd/lzldbdata  |                   | 


## \dp+ lzlpartition_202306
                                    Access privileges
 Schema |        Name         | Type  | Access privileges | Column privileges | Policies 
--------+---------------------+-------+-------------------+-------------------+----------
 public | lzlpartition_202306 | table |                   |                   | 

Execution plan (correct):

explain  select count(*) from   lzlpartition WHERE APPL_NO = '217450' AND IS_DELETED = 'N' AND DATE_CREATED > now() - interval '31' day AND DATE_CREATED < now();
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=36.76..36.77 rows=1 width=8)
   ->  Append  (cost=0.29..36.74 rows=7 width=0)
         Subplans Removed: 5
         ->  Index Scan using lzlpartition_202305_appl_no_idx on lzlpartition_202305 lzlpartition_1  (cost=0.15..5.19 rows=1 width=0)
               Index Cond: ((appl_no)::text = '217450'::text)
               Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))
         ->  Index Scan using lzlpartition_202306_appl_no_idx on lzlpartition_202306 lzlpartition_2  (cost=0.15..5.19 rows=1 width=0)
               Index Cond: ((appl_no)::text = '217450'::text)
               Filter: (((is_deleted)::text = '0'::text) AND (date_created < now()) AND (date_created > (now() - '31 days'::interval day)))

The permissions were still not inherited. In fact, we tested on other PostgreSQL versions and observed the same behavior — it seems to be a general behavior.

However, even so, we couldn’t reproduce the issue. The test results used the correct index, unlike the production environment which used the wrong index.

Summary
#

Since we had collected statistics and killed sessions, it shouldn’t have been a cached execution plan issue. After executing GRANT, the partition execution plan immediately became correct (even granting just one partition fixed that specific partition), so we are fairly confident that the partition permission issue caused the abnormal partition execution plan.

The analysis and resolution process can be summarized as follows:

  1. Switch to the application user to view the execution plan. Using the superuser to view execution plans is a common practice, but the plan seen from the superuser may not be correct.
  2. Permissions on child partitions of partitioned tables. The root cause is that permissions on child partitions of PostgreSQL partitioned tables were inconsistent with the parent table, causing the execution plan to be abnormal. In other words, permission issues affected PostgreSQL’s execution plan.
  3. This issue is difficult to reproduce and occurs very, very rarely.
  4. Permission-caused execution plan anomalies are extremely subtle and hard to diagnose.

Two questions worth deeper discussion:

  1. Permission issues shouldn’t affect execution plans. Why do permissions affect execution plans?
  2. Child partition permissions are inconsistent with parent table permissions. Why don’t child partitions fully inherit parent table permissions?

A bug report has been submitted to see what the official team says.

Related

A Classic Case of Long Transaction, Table Bloat, and LIMIT Issues

·1309 words·7 mins
Slow Primary Key Update — Problem Analysis # A simple primary key update took over 1 second to execute. Due to high concurrency, the CPU was completely maxed out: 2024-04-01 10:19:36.084 CST,"lzlopr","lzl",158751,"10.33.78.149:51502",66055a6b.26c1f,172,"UPDATE",2024-03-28 19:54:19 CST,528/19816630,970251337,LOG,00000,"duration: 1218.688 ms plan: Query Text: update table_a set (omitted...)=$6 where column_id =$7 Update on table_a (cost=0.40..5.49 rows=1 width=2774) -> Index Scan using pk_id on table_a (cost=0.40..5.49 rows=1 width=2774) Index Cond: ((column_id)::text = $7)",,,,,,,,,"PostgreSQL JDBC Driver","client backend" The SQL itself is very simple — an update with a condition on the primary key. Looking at the execution plan, it used the pk_id primary key index, so there was no problem with the plan itself; the issue wasn’t a plan change.

Analyzing a 5MB SQL That Consumed 70GB of Memory

·2217 words·11 mins
Process Memory Analysis # "WAL writer process (PID 66902) was terminated by signal 6: Aborted",,,,,,,,,"","postmaster" The log shows postmaster process 66902 was killed. Checking OS-level process memory: since top doesn’t show PPID and ps doesn’t show USS, we need both: USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 211276 66478 19 8.7 10.6 57488380 56389972 - R 17:13:03 00:02:47 postgres: BIND postgres 211277 66478 19 7.8 9.6 52294700 51127480 - R 17:13:03 00:02:31 postgres: BIND postgres 222749 66478 19 22.7 9.3 51320000 49073368 - R 17:35:33 00:02:09 postgres: BIND postgres 39513 66478 19 2.9 6.8 38651084 36354736 ep_poll S 16:13:03 00:02:43 postgres: idle Using PPID to identify high-memory backend processes. Let’s examine process 211276:

Case Study: Analyzing Occasional Slow INSERT VALUES

·2017 words·10 mins
The business team reported that INSERT VALUES occasionally became slow. By the time I checked the active sessions, the slow write problem had already subsided. Later, I discovered that the slow write problem lasted less than half a minute, with INSERT VALUES taking 1-2 seconds. I wrote a script to capture active session information and managed to get the session data: wait_event | count ---------------------+------- [null] | 11 WALRead | 1 DataFileRead | 2 BgWriterMain | 1 WALWrite | 40 AutoVacuumMain | 1 ClientRead | 385 LogicalLauncherMain | 1 The most abnormal wait event was WALWrite with 40 sessions.