跳过正文
  1. 文章/

长事务、表膨胀、limit问题的一个经典案例

·1948 字·4 分钟
liuzhilong62
作者
liuzhilong62
PostgreSQL DBA,关注数据库内核、案例分析、源码解读
C M
目录

update主键慢,问题分析
#

一个简单的主键更新,update执行了超过1s,由于并发比较高,cpu直接打满

2024-04-01 10:19:36.084 CST,"lzlopr","lzl",158751,"10.33.78.149:51502",66055a6b.26c1f,172,"UPDATE",2024-03-28 19:54:19 CST,528/19816630,970251337,LOG,00000,"duration: 1218.688 ms  plan:
Query Text:             update table_a                 set (省略...)=$6           where column_id =$7
Update on table_a  (cost=0.40..5.49 rows=1 width=2774)
->  Index Scan using pk_id on table_a  (cost=0.40..5.49 rows=1 width=2774)
      Index Cond: ((column_id)::text = $7)",,,,,,,,,"PostgreSQL JDBC Driver","client backend"

sql本身也非常简单,就是条件为=‘主键’的更新。看update的执行计划,是走了pk主键索引的,所以从执行计划上看没有问题,问题不在执行计划突变。

改写下sql(因为是update),explain (analyze,buffers)对比看下SQL的执行消耗

=> explain (analyze,buffers) select * from table_a where column_id='d4f713370e584820a9b15e2218ea436a';
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on table_a  (cost=2.91..5.42 rows=1 width=1156) (actual time=55.052..123.354 rows=1 loops=1)
   Recheck Cond: ((column_id)::text = 'd4f713370e584820a9b15e2218ea436a'::text)
   Heap Blocks: exact=1
   Buffers: shared hit=13870
   ->  Bitmap Index Scan on pk_id  (cost=0.00..2.91 rows=1 width=0) (actual time=3.464..3.465 rows=13866 loops=1)
         Index Cond: ((column_id)::text = 'd4f713370e584820a9b15e2218ea436a'::text)
         Buffers: shared hit=24
 Planning:
   Buffers: shared hit=4261
 Planning Time: 11.028 ms
 Execution Time: 123.567 ms
(11 rows)

这里的真实执行计划没问题,但是shared hit=13870明显太高了。正常来说走主键不应该扫描太多的page,这里比较容易联想到表膨胀

查看表膨胀:

--表大小 \dt
Size        | 525 MB
--表的实际行数
count | 827
--死元组 pg_stat_all_tables
n_live_tup          | 786
n_dead_tup          | 657604

有只有800条活元组,但是有65w条死元组!主键扫描了多个page的原因应该就是这个,但是为啥没有回收死元组呢? 表在默认修改20%数据的时候就会触发autovacuum执行vacuum回收,这个在日志中可以看到,autovacuum是有被触发的:

2024-04-01 14:13:46.649 CST,,,14081,,660a5099.3701,1,,2024-04-01 14:13:45 CST,259/17828993,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:13:47.801 CST,,,14081,,660a5099.3701,2,,2024-04-01 14:13:45 CST,259/17828994,971045014,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.08 s, system: 0.01 s, elapsed: 1.15 s",,,,,,,,,"","autovacuum worker"
2024-04-01 14:14:46.673 CST,,,26136,,660a50d5.6618,1,,2024-04-01 14:14:45 CST,259/17829090,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:14:47.833 CST,,,26136,,660a50d5.6618,2,,2024-04-01 14:14:45 CST,259/17829091,971049759,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.15 s",,,,,,,,,"","autovacuum worker"
2024-04-01 14:15:46.680 CST,,,40743,,660a5111.9f27,1,,2024-04-01 14:15:45 CST,259/17829164,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:15:47.849 CST,,,40743,,660a5111.9f27,2,,2024-04-01 14:15:45 CST,259/17829165,971055464,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s",,,,,,,,,"","autovacuum worker"
2024-04-01 14:16:46.677 CST,,,52599,,660a514d.cd77,1,,2024-04-01 14:16:45 CST,259/17829263,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:16:47.844 CST,,,52599,,660a514d.cd77,2,,2024-04-01 14:16:45 CST,259/17829264,971061382,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s",,,,,,,,,"","autovacuum worker"
2024-04-01 14:17:46.699 CST,,,64858,,660a5189.fd5a,1,,2024-04-01 14:17:45 CST,234/16589539,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:17:47.851 CST,,,64858,,660a5189.fd5a,2,,2024-04-01 14:17:45 CST,234/16589540,971066091,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s",,,,,,,,,"","autovacuum worker"
2024-04-01 14:18:46.703 CST,,,78112,,660a51c5.13120,1,,2024-04-01 14:18:45 CST,259/17829409,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
2024-04-01 14:18:47.854 CST,,,78112,,660a51c5.13120,2,,2024-04-01 14:18:45 CST,259/17829410,971070390,LOG,00000,"automatic analyze of table ""lzl.public.table_a"" system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s",,,,,,,,,"","autovacuum worker"		

不仅有触发,而且间隔刚好是一分钟。autovacuum_naptime间隔触发时间默认就是1分钟

>= show autovacuum_naptime ;
 autovacuum_naptime 
--------------------
 1min
(1 row)

可以判断:

  • autovacuum成功触发
  • 死元组可能回收不及,1分钟内产生的死元组大于20%(可能1分钟太长了);或者根本没有回收,下次一定触发autovacuum

看下autovacuum的具体内容:

2024-04-01 10:22:44.648 CST,,,16827,,660a1a73.41bb,1,,2024-04-01 10:22:43 CST,170/16910186,0,LOG,00000,"automatic vacuum of table ""lzl.public.table_a"": index scans: 0
pages: 0 removed, 48745 remain, 6 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077
buffer usage: 97603 hits, 0 misses, 5 dirtied
avg read rate: 0.000 MB/s, avg write rate: 0.028 MB/s
system usage: CPU: user: 0.21 s, system: 0.22 s, elapsed: 1.41 s
WAL usage: 4 records, 3 full page images, 5129 bytes",,,,,,,,,"","autovacuum worker"

autovacuum触发了,但是完全没有回收死元组tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077oldest xmin代表库内的最旧事务,也就是有长事务,这个很好找到:

>=  select pid,usename,xact_start,state_change,wait_event,state,query from  pg_stat_activity where state<>'idle' order by xact_start ;
  pid   |  usename   |          xact_start           |         state_change          |     wait_event      |        state        |                                                                              
--------+------------+-------------------------------+-------------------------------+---------------------+---------------------+------------------------------------------------------------------------------
 164658 | phbdspsqp  | 2024-04-01 08:36:57.275408+08 | 2024-04-01 08:36:57.299609+08 | DataFileRead        | active              | SELECT "minval","maxval" FROM (select min(ID) as minval,max(TRACK

长事务就是这个早上8点多跑了几个小时sql。虽然不是一个表,但是因为是oldest xmin所以也会有影响。 至此原因已经定位:

  • 表A的update频繁,表膨胀风险较高
  • 表B长事务阻止表A死元组回收
  • 表A的update语句扫描了过多的page

解决办法:

  • 结束长事务。select pg_terminate_backend(164658)
  • 手动vacuum或者等待1分钟(以内)自动vacuum。 vacuum table_a

上面2个操作完成后,查看死元组

n_live_tup          | 707
n_dead_tup          | 298

65w的死元组已清理。 再次查看执行计划:

=>  explain (analyze,buffers) select * from table_a where column_id='d4f713370e584820a9b15e2218ea436a';
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pk_id on table_a  (cost=0.40..5.44 rows=1 width=621) (actual time=0.026..0.029 rows=1 loops=1)
   Index Cond: ((column_id)::text = 'd4f713370e584820a9b15e2218ea436a'::text)
   Buffers: shared hit=6
 Planning Time: 0.057 ms
 Execution Time: 0.043 ms

shared hit只有6,故障恢复。

另外,vacuum只会回收死元组,但是不会清理空间,表还是那么大。这个只有等新数据复用空间或者repack等重建表的操作来回收

Size        | 525 MB

order by limit的番外sql优化:
#

刚才那个长事务sql,也有问题··· 业务反馈前几天都快,今天跑了几个小时了

explain select min(ID) as minval,max(ID) as maxval from table_b where time_at >= to_timestamp('2024-03-30 00:00:00','yyyy-MM-dd HH24:mi:ss');
                                                                            QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result  (cost=4298.54..4298.55 rows=1 width=64)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.70..2149.27 rows=1 width=32)
          ->  Index Scan using pk_b on table_b  (cost=0.70..1181490202.27 rows=549896 width=32)
                Index Cond: ((ID)::text IS NOT NULL)
                Filter: (time_at >= to_timestamp('2024-03-30 00:00:00'::text, 'yyyy-MM-dd HH24:mi:ss'::text))
  InitPlan 2 (returns $1)
    ->  Limit  (cost=0.70..2149.27 rows=1 width=32)
          ->  Index Scan Backward using pk_b on table_b table_b_1  (cost=0.70..1181490202.27 rows=549896 width=32)
                Index Cond: ((ID)::text IS NOT NULL)
                Filter: (time_at >= to_timestamp('2024-03-30 00:00:00'::text, 'yyyy-MM-dd HH24:mi:ss'::text))

sql也很简单,条件只有一个时间字段 ,过滤性还可以 不过这个sql没有走 time_at 索引,而是走的 ID主键索引 。这个跟limit问题是一样的。analyze是没有用的,最好是改写sql。 改写后SQL结果秒出:

explain select min(ID||'') as minval,max(ID||'') as maxval from table_b where time_at >= to_timestamp('2024-03-30 00:00:00','yyyy-MM-dd HH24:mi:ss')

                                                        QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=1201418.86..1201418.87 rows=1 width=64)
  ->  Index Scan using idx_time_at on table_b  (cost=0.57..1195919.90 rows=549896 width=33)
        Index Cond: (time_at >= to_timestamp('2024-03-30 00:00:00'::text, 'yyyy-MM-dd HH24:mi:ss'::text))

这个还不是执行计划突变,因为没有变。前几天同样是这样的执行计划,但是跑的比较快,原因是还是因为数据分布和limit的机制,数据很快就找到自然就返回了,这也是优化器为什么会选择主键索引;“运气不好”数据半天找不到就要跑很久。

小结
#

一个经典的案例:

  • update频繁更新的小表
  • 长事务阻止死元组回收
  • 长事务是排序和limt操作(order by,max/min,group都有可能)导致的索引选择问题

一个故障,三个postgres的经典知识点,相当有代表性。

相关文章

ORDER BY limit 10比ORDER BY limit 100更慢

·3419 字·7 分钟
问题分析 # pg数据库中执行sql时,ORDER BY limit 10比ORDER BY limit 100更慢 执行计划分析 # SELECT *, (select cl.ITEM_DESC from tablelzl2 cl where item_name='name' and cl.ITEM_NO='abcdefg') AS "item" FROM tablelzl1 RI WHERE RI.column1='AAAA' AND RI.column2 = 'applyno20231112' ORDER BY RI.column3 DESC limit 10 Limit (cost=0.43..1522.66 rows=10 width=990) -> Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990) Filter: (((column1)::text = 'AAAA'::text) AND ((column2)::text = 'applyno20231112'::text)) SubPlan 1 -> Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18) Index Cond: (((item_no)::text = 'manualSign'::text) AND ((item_name)::text = (ri.manual_sign)::text)) 主表没有走到column2索引,而是走column3排序字段索引的Index Scan Backward,scan index的cost非常高,而最终的cost比较低,实际执行需要9s 如果把limit 10改成limit 100,执行计划正常:

分区权限问题导致执行计划不正确

·3277 字·7 分钟
问题现象 # 业务昨晚对sql进行了更新,之前没有DATE_CREATED字段(是分区键),跑的很快。发版后加了分区字段,本身是为了减少分区数量的访问,但是加了以后UPDATE执行却变慢了。 before: