跳过正文
  1. 文章/

控制文件上的参数和主从参数不一致问题

·3149 字·7 分钟
liuzhilong62
作者
liuzhilong62
PostgreSQL DBA,关注数据库内核、案例分析、源码解读
C M

PARAMETER_CHANGE和控制文件上的数据库参数
#

pg有些参数会影响从库的运行,而这些参数不仅在参数文件中,也写到controlfile,只要参数改变就会写入wal并更新控制文件。 从库会通过PARAMETER_CHANGE wal record进行redo,写入到从库的controlfile中。 PARAMETER_CHANGE wal record:

rmgr: XLOG        len (rec/tot):     54/    54, tx:          0, lsn: 27F/800001C0, prev 27F/80000148, desc: PARAMETER_CHANGE max_connections=3000 max_worker_processes=20 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=1024 wal_level=logical wal_log_hints=off track_commit_timestamp=on

XLOG_PARAMETER_CHANGE记录了这8种参数,控制文件也可以直接看,:

$ pg_controldata |grep setting
wal_level setting:                    logical
wal_log_hints setting:                on
max_connections setting:              1000
max_worker_processes setting:         20
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           1024
track_commit_timestamp setting:       on

这些参数都是主库的,哪怕这个控制文件是从库的。 startup进程通过CheckRequiredParameterValues函数检查6个参数是否满足条件。其中1个参数wal_level>=replica,另外5个参数max_connections,max_worker_processes,max_wal_senders,max_prepared_transactions,max_locks_per_transaction会检查主从大小,从库小的话会pause recovery。直接将主库参数调大,从库直接挂掉,pg log如下:

FATAL,22023,"hot standby is not possible because max_connections = 2000 is a lower setting than on the master server (its value was 3000)",,,,,"WAL redo at 27F/800001C0 for XLOG/PARAMETER_CHANGE: max_connections=3000 max_worker_processes=20 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=1024 wal_level=logical wal_log_hints=off track_commit_timestamp=on",,,,"","startup"

8个参数中的6个都可能严重影响从库的运行。另外2个参数wal_log_hints,track_commit_timestamp不会被startup进程立即检查。8个参数同步到控制文件都有其意义。

wal_log_hint主从不一致
#

wal_log_hints的变化会记录到wal日志中,虽然不会被startup进程检查,pg_rewind会检查:

perform_rewind(...)
{
...
	/*
	 * Target cluster need to use checksums or hint bit wal-logging, this to
	 * prevent from data corruption that could occur because of hint bits.
	 */
	if (ControlFile_target.data_checksum_version != PG_DATA_CHECKSUM_VERSION &&
		!ControlFile_target.wal_log_hints)
	{
		pg_fatal("target server needs to use either data checksums or \"wal_log_hints = on\"");
	}

因为wal_log_hints是wal相关的,所以pg_rewind去检查从库的wal_log_hints参数有没有打开没有意义,应检查主库的wal_log_hints有没有打开,所以pg将wal_log_hints参数同步到从库的控制文件中了,非常合理。

wal_log_hint主从不一致测试:

select * from t1;
checkpoint;
update t1 set b='eee';  --观察点1
checkpoint;  --忽略该online checkpoint wal record
select * from t1;  --观察点2

--观察动作
pg_waldump 000000020000027F0000000A|tail -10

-- observing option
select t_ctid,lp,
case lp_flags 
when 0 then '0:LP_UNUSED' 
when 1 then 'LP_NORMAL' 
when 2 then 'LP_REDIRECT'  
when 3 then 'LP_DEAD' 
end as lp_flags,
t_xmin,t_xmax,t_field3 as t_cid, raw_flags, info.combined_flags,substring(t_data,0,40) 
from heap_page_items(get_raw_page('t1',0)) item,
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) info 
order by lp;

on,on:

--观察点1:
rmgr: Heap        len (rec/tot):     85/   208, tx:   11140182, lsn: 27F/5000CC38, prev 27F/5000CBC0, desc: HOT_UPDATE off 3 xmax 11140182 flags 0x10 ; new off 4 xmax 0, blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Transaction len (rec/tot):     46/    46, tx:   11140182, lsn: 27F/5000CD08, prev 27F/5000CC38, desc: COMMIT 2025-07-21 18:28:13.292397 CST
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/5000CD38, prev 27F/5000CD08, desc: RUNNING_XACTS nextXid 11140183 latestCompletedXid 11140182 oldestRunningXid 11140183
--观察点2:
rmgr: XLOG        len (rec/tot):     51/   171, tx:          0, lsn: 27F/58000110, prev 27F/580000D8, desc: FPI_FOR_HINT , blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/580001C0, prev 27F/58000110, desc: RUNNING_XACTS nextXid 11140183 latestCompletedXid 11140182 oldestRunningXid 11140183

off,off:

--观察点1:
rmgr: Heap        len (rec/tot):     85/   225, tx:   11140183, lsn: 27F/580003C8, prev 27F/58000390, desc: HOT_UPDATE off 4 xmax 11140183 flags 0x10 ; new off 5 xmax 0, blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Transaction len (rec/tot):     46/    46, tx:   11140183, lsn: 27F/580004B0, prev 27F/580003C8, desc: COMMIT 2025-07-21 18:33:18.192146 CST
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/580004E0, prev 27F/580004B0, desc: RUNNING_XACTS nextXid 11140184 latestCompletedXid 11140183 oldestRunningXid 11140184
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/58000518, prev 27F/580004E0, desc: RUNNING_XACTS nextXid 11140184 latestCompletedXid 11140183 oldestRunningXid 11140184
--观察点2:

on,off:

--观察点1:
rmgr: Heap        len (rec/tot):     85/   274, tx:   11140186, lsn: 27F/58000C18, prev 27F/58000BA0, desc: HOT_UPDATE off 7 xmax 11140186 flags 0x10 ; new off 8 xmax 0, blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Transaction len (rec/tot):     46/    46, tx:   11140186, lsn: 27F/58000D30, prev 27F/58000C18, desc: COMMIT 2025-07-21 18:40:17.638691 CST
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/58000D60, prev 27F/58000D30, desc: RUNNING_XACTS nextXid 11140187 latestCompletedXid 11140186 oldestRunningXid 11140187
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/58000D98, prev 27F/58000D60, desc: RUNNING_XACTS nextXid 11140187 latestCompletedXid 11140186 oldestRunningXid 11140187
--观察点2:
rmgr: XLOG        len (rec/tot):     51/   236, tx:          0, lsn: 27F/58000E48, prev 27F/58000DD0, desc: FPI_FOR_HINT , blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/58000F38, prev 27F/58000E48, desc: RUNNING_XACTS nextXid 11140187 latestCompletedXid 11140186 oldestRunningXid 11140187

off,on:

--观察点1:
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/58001108, prev 27F/58001090, desc: RUNNING_XACTS nextXid 11140187 latestCompletedXid 11140186 oldestRunningXid 11140187
rmgr: Heap        len (rec/tot):     85/   289, tx:   11140187, lsn: 27F/58001140, prev 27F/58001108, desc: HOT_UPDATE off 8 xmax 11140187 flags 0x10 ; new off 9 xmax 0, blkref #0: rel 1663/7472552/7472597 blk 0 FPW
rmgr: Transaction len (rec/tot):     46/    46, tx:   11140187, lsn: 27F/58001268, prev 27F/58001140, desc: COMMIT 2025-07-21 18:44:08.550109 CST
rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn: 27F/58001298, prev 27F/58001268, desc: RUNNING_XACTS nextXid 11140188 latestCompletedXid 11140186 oldestRunningXid 11140187; 1 xacts: 11140187
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 27F/580012D0, prev 27F/58001298, desc: RUNNING_XACTS nextXid 11140188 latestCompletedXid 11140187 oldestRunningXid 11140188
--观察点2:

测试小结:

  • FPI_FOR_HINT在hintbit回写时产生,查询语句可产生FPI_FOR_HINT
  • 无论从on or off,主为on的时候,FPI_FOR_HINT就会生产

附加知识:what is XLOG_RUNNING_XACTS
#

XLOG_RUNNING_XACTS是RM_STANDBY_ID的一种:

/*
 * XLOG message types
 */
#define XLOG_STANDBY_LOCK			0x00
#define XLOG_RUNNING_XACTS			0x10
#define XLOG_INVALIDATIONS			0x20

XLOG_STANDBY_LOCK:记录 AccessExclusiveLock 的获取和释放,用于 Standby 节点识别锁状态。

XLOG_RUNNING_XACTS: running-xacts snapshots用于构建快照,确保事务一致性

XLOG_INVALIDATIONS:INVALIDATIONS消息,用于同时效元数据信息给local backend

 * standbydefs.h
 *	   Frontend exposed definitions for hot standby mode.

RM_STANDBY_ID是专门为hot standby只读从库定义的rmgr。对于本地实例恢复、逻辑解析这些需要用到wal的场景,``RM_STANDBY_ID`本质上对他们没有意义。

测试事务提交观察wal record:

commandwal record
begin;
select * from txid_current(); –11140191
commit;rmgr: Transaction len (rec/tot): 46/ 46, tx: 11140191, lsn: 27F/80000538, prev 27F/80000500, desc: COMMIT 2025-07-23 11:16:10.872724 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 27F/80000568, prev 27F/80000538, desc: RUNNING_XACTS nextXid 11140192 latestCompletedXid 11140191 oldestRunningXid 11140192

事务id 本身commit or abort还是rmgr: Transaction在同步,快照通过rmgr: Standby RUNNING_XACTS 同步。

track_commit_timestamp主从不一致
#

track_commit_timestamp,startup进程会在接受到相应wal后将从库的commit ts功能激活,主要是为从库查看xid提交时间用:

/*
 * Activate or deactivate CommitTs' upon reception of a XLOG_PARAMETER_CHANGE
 * XLog record during recovery.
 */
void
CommitTsParameterChange(bool newvalue, bool oldvalue)
{
	/*
	 * If the commit_ts module is disabled in this server and we get word from
	 * the primary server that it is enabled there, activate it so that we can
	 * replay future WAL records involving it; also mark it as active on
	 * pg_control.  If the old value was already set, we already did this, so
	 * don't do anything.
	 *
	 * If the module is disabled in the primary, disable it here too, unless
	 * the module is enabled locally.
	 *
	 * Note this only runs in the recovery process, so an unlocked read is
	 * fine.
	 */
	if (newvalue)
	{
		if (!commitTsShared->commitTsActive)
			ActivateCommitTs();
	}
	else if (commitTsShared->commitTsActive)
		DeactivateCommitTs();
}

track_commit_timestamp主从不一致测试,测试步骤:

  1. 初始状态,主=on,从=on 。均能使用committed_xact等函数
  2. 主=off(重启主库),从库=on(不动)。均不能使用committed_xact等函数

主库修改并重启后,从库复制依然正常,committed_xact等函数不可使用:

$ select * from pg_last_committed_xact();
ERROR:  55000: could not get commit timestamp data
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set on the primary server.
LOCATION:  error_commit_ts_disabled, commit_ts.c:385

$ show track_commit_timestamp
-> ;
 track_commit_timestamp 
------------------------
 on
(1 row)

Time: 0.198 ms
$ \q
## pg_controldata |grep track_commit_timestamp
track_commit_timestamp setting:       off

pg14+ pause recovery
#

pg14对主库参数修改导致从库宕机做了提升。当参数不满足条件时,只读从库直接宕掉改成只读从库不宕只停复制,代码见RecoveryRequiresIntParameter

Pause recovery on a hot standby server if the primary changes its parameters in a way that prevents replay on the standby (Peter Eisentraut)

Previously the standby would shut down immediately

测试pg14修改参数导致从库复制中断:

2025-07-23 19:46:31.337 CST,,,141823,,6880ca5f.229ff,14,,2025-07-23 19:41:19 CST,1/0,0,LOG,00000,"recovery has paused","If recovery is unpaused, the server will shut down.","You can then restart the server after making the necessary configuration changes.",,,"WAL redo at 281/78324BE8 for XLOG/PARAMETER_CHANGE: max_connections=2000 max_worker_processes=20 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=1024 wal_level=logical wal_log_hints=on track_commit_timestamp=on",,,,"","startup",,0

因为复制已经中断,去把主库的参数改回来也没有用,从库无法应用后面的变更并更新控制文件,所以只能修改从库参数并重启(log的提示也很明显了)。

小结8个参数
#

8个参数在主库被修改并重启后,均会更新本地控制文件,如果参数有变会将改变后的参数写入wal中同步给下游,下游redo这个parameter change wal record,即更新本地的控制文件,从库根据一定条件来判断主从或者其他功能是否可用。

写到控制文件中的8个参数checkif not,standby(pg13-)if not,standby(pg14+)
wal_level!=minimal无法同步,原理性问题无法同步,原理性问题
max_connections主<=从hot standby宕机hot standby停止同步
max_worker_processes主<=从hot standby宕机hot standby停止同步
max_wal_senders主<=从hot standby宕机hot standby停止同步
max_prepared_transactions主<=从hot standby宕机hot standby停止同步
max_locks_per_transaction主<=从hot standby宕机hot standby停止同步
wal_log_hintspg_rewind前置条件(either data checksums or wal_log_hints = on)不影响standby同步不影响standby同步
track_commit_timestamp打开/关闭从库的commit_ts功能不影响standby同步不影响standby同步

特别鸣谢:高长军高大师

相关文章

PG在还没有pg_class的时候怎么访问基础系统表?

·3748 字·8 分钟
在没有pg_class的时候,数据库怎么访问系统表?这个问题可以分成两个阶段来看: 数据库簇初始化,此时一个database都没有,所以怎么构造和访问pg_class等系统表是一个问题 私有内存初始化系统表。PG的系统表信息是放在backend本地进程上的,backend在初始化的时候又怎么load pg_class? 初始化数据字典 # 在数据库还没有初始化的时候,明显是不能通过访问数据字典来初始化database、pg_class等等对象的,因为一个库都没有就不能create database,也没有pg_class去查元数据信息。 PG通过bki文件的特殊语言初始化一些数据结构,然后在bootstrap模式初始化一个原始database1。

pg truncate浅析

·3936 字·8 分钟
命令选项 # TRUNCATE [ TABLE ] [ ONLY ] name [ * ] [, ... ] [ RESTART IDENTITY | CONTINUE IDENTITY ] [ CASCADE | RESTRICT ] 1.ONLY:只truncate指定的表。当表有继承子表或有子分区时,默认会一起truncate;only可只truncate继承父表。分区父表不能指定only

vacuum啥时候截断末尾页?

·2659 字·6 分钟
vacuum truncate # TRUNCATE—Specifies that VACUUM should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This is normally the desired behavior and is the default unless the vacuum_truncate option has been set to false for the table to be vacuumed. Setting this option to false may be useful to avoid ACCESS EXCLUSIVE lock on the table that the truncation requires. This option is ignored if the FULL option is used.