Skip to main content
  1. Posts/

Some Features of PostgreSQL Logical Replication

·1446 words·7 mins
liuzhilong62
Author
liuzhilong62
PostgreSQL DBA. Writing about database internals, production cases, and source code analysis.

I’ve already written a fairly detailed article about logical replication before, so I won’t repeat the basics here. However, some knowledge points inevitably get missed. Recently I’ve discovered some interesting logical replication features.

replica identity and old/new values
#

replica identity is used to identify a row during logical replication. The above statement is certainly correct, but it doesn’t explain the changes in old and new data.

DEFAULT Records the old values of the columns of the primary key, if any. This is the default for non-system tables. USING INDEX index_name Records the old values of the columns covered by the named index, that must be unique, not partial, not deferrable, and include only columns marked NOT NULL. If this index is dropped, the behavior is the same as NOTHING. FULL Records the old values of all columns in the row. NOTHING Records no information about the old row. This is the default for system tables.

The PG official documentation only explains the old value situation for replica identity — for example, it doesn’t even mention that NOTHING won’t replicate update/delete. This shows the importance of old values.

Creating a replication link:

select pg_create_logical_replication_slot('pubtestlzl2','test_decoding');
pg_recvlogical -d lzldb --slot=pubtestlzl2 --start -f recv.sql &

Normal test_decoding replication link simulation:

--replica identity defaults to d: uses primary key when available; without primary key, defaults to nothing, unable to replicate update and delete
M=>  create table lzltest(a bigint primary key,b varchar(100),c varchar(100));
CREATE TABLE
M=> insert into lzltest values(1,'bbbbbb','ccccccccc');
INSERT 0 1
M=> update lzltest set b='b';
UPDATE 1

recvlogical output:

table public.lzltest: INSERT: a[bigint]:1 b[character varying]:'bbbbbb' c[character varying]:'ccccccccc'
table public.lzltest: UPDATE: a[bigint]:1 b[character varying]:'b' c[character varying]:'ccccccccc'

With replica identity as default, updating a non-primary-key field — all fields have only new values.

M=> update lzltest set a='111';
UPDATE 1
table public.lzltest: UPDATE: old-key: a[bigint]:1 new-tuple: a[bigint]:111 b[character varying]:'bb' c[character varying]:'ccccccccc'

With replica identity as default, updating the primary key — the identity column’s old and new values are decoded; other fields only have new values.

M=> alter table lzltest replica identity full;
ALTER TABLE

M=>  update lzltest set b='b';
UPDATE 1
table public.lzltest: UPDATE: old-key: a[bigint]:2 b[character varying]:'b' c[character varying]:'ccccccccc' new-tuple: a[bigint]:2 b[character varying]:'b' c[character varying]:'ccccccccc'

With replica identity set to full, both old and new values for the entire row are preserved.

Whether in default (primary key) or full mode, all column information is recorded. The difference lies in whether old data is present. In default mode:

  • insert: inherently new data, so naturally no old values — all column new values are recorded.
  • update: records new values for all columns; only the identity column has old values (if the identity column was updated).
  • delete: inherently old data, but not all columns are necessarily recorded. The same rule applies: only the identity column has old values — only the identity column is recorded.

Summary: When replica identity is default, regardless of the operation (INSERT, UPDATE, DELETE), as long as it’s old data, only the identity column is recorded; as long as it’s new data, all columns are recorded.

When changing from default to full, the decoded log volume difference isn’t particularly large, because new data always includes all columns. (Excluding scenarios that are entirely deletes) the log volume decoded under full is less than twice that of default.

pgoutput cannot be peeked
#

Create a replication slot using pgoutput:

select pg_create_logical_replication_slot('pubtestlzl','pgoutput');

Then try to peek or receive — both fail:

select * from pg_logical_slot_peek_changes('pubtestlzl',null,null);
pg_recvlogical -d lzldb --slot=pubtestlzl --start -f recv.sql &
pg_recvlogical: error: could not send replication command "START_REPLICATION SLOT "pubtestlzl" LOGICAL 0/0": ERROR:  client sent proto_version=0 but we only support protocol 1 or higher
CONTEXT:  slot "pubtestlzl", output plugin "pgoutput", in the startup callback
pg_recvlogical: disconnected; waiting 5 seconds to try again

You cannot peek or use pg_recvlogical to receive from a pgoutput replication slot. Since pgoutput is the output plugin for publish-subscribe, this plugin cannot be manually peeked or received…

Publish-Subscribe Doesn’t Have to Be PG-to-PG
#

create publication and create subscription are PG internal commands that can also be used to create links between PG databases. Third-party software can similarly use create publication and simulate subscriptions to create replication slots. This is better than directly creating replication slots because publications can manage replicated tables.

TOAST and Logical Decoding
#

TOAST columns being sent are NOT decoded! This means an entire row of data may only have part of it transmitted (when TOAST columns themselves haven’t been updated).

Normal decoding decodes all columns:

--Create a test_decoding replication slot
=>  select pg_create_logical_replication_slot('logical_dest','test_decoding');
 pg_create_logical_replication_slot 
------------------------------------
 (logical_dest,349/A80040E0)
(1 row)

--Create a table with small columns
=> create table test1(a int primary key,b varchar(100),c varchar(100));
CREATE TABLE

=> select* from pg_replication_slots;
  slot_name   |    plugin     | slot_type | datoid | database | temporary | active | active_pid |  xmin  | catalog_xmin | restart_lsn  | confirmed_flush_lsn | wal_status | safe_wal_size 
--------------+---------------+-----------+--------+----------+-----------+--------+------------+--------+--------------+--------------+---------------------+------------+---------------
 logical_dest | test_decoding | logical   | 418679 | lzldb    | f         | f      |     [null] | [null] |    872483335 | 349/A80040A8 | 349/A80040E0        | reserved   |        [null]
(1 row)


=>  insert into test1 values (1,'qwer','qwer');
INSERT 0 1
Time: 0.915 ms
=> select * from pg_logical_slot_peek_changes('logical_dest',null,null);
     lsn      |    xid    |                                               data                                               
--------------+-----------+--------------------------------------------------------------------------------------------------
 349/A8004C78 | 872483335 | BEGIN 872483335
 349/A80103E8 | 872483335 | COMMIT 872483335
 349/A8018B30 | 872483369 | BEGIN 872483369
 349/A8018B30 | 872483369 | table public.test1: INSERT: a[integer]:1 b[character varying]:'qwer' c[character varying]:'qwer'
 349/A8018C50 | 872483369 | COMMIT 872483369
(5 rows)
--insert is decoded, containing all columns

=> update test1 set b='zxcv' where c='qwer';
UPDATE 1
Time: 4.005 ms
=>  select * from pg_logical_slot_peek_changes('logical_dest',null,null);
     lsn      |    xid    |                                               data                                               
--------------+-----------+--------------------------------------------------------------------------------------------------
 349/A8004C78 | 872483335 | BEGIN 872483335
 349/A80103E8 | 872483335 | COMMIT 872483335
 349/A8018B30 | 872483369 | BEGIN 872483369
 349/A8018B30 | 872483369 | table public.test1: INSERT: a[integer]:1 b[character varying]:'qwer' c[character varying]:'qwer'
 349/A8018C50 | 872483369 | COMMIT 872483369
 349/A801D018 | 872483378 | BEGIN 872483378
 349/A801D018 | 872483378 | table public.test1: UPDATE: a[integer]:1 b[character varying]:'zxcv' c[character varying]:'qwer'
 349/A801D098 | 872483378 | COMMIT 872483378
(8 rows)
--update is decoded, containing all columns

Normally, without TOAST, decoded data includes all columns of the row.

TOAST decoding test:

--Enlarge the columns
=>  alter table test1 alter column b type varchar(3000);
ALTER TABLE
Time: 8.091 ms
=>   alter table test1 alter column c type varchar(3000);
ALTER TABLE
Time: 0.937 ms

--A batch random function
=> create or replace function f_random_str(length INTEGER) returns character varying
-> LANGUAGE plpgsql
-> AS $$
-> DECLARE
-> result varchar(3000);
-> BEGIN
-> SELECT array_to_string(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer)
-> FROM generate_series(1,length)), '') INTO result;
-> return result;
-> END
-> $$;
CREATE FUNCTION

--Insert data
=>  insert into test1 values (2,f_random_str(2000),f_random_str(2000));
INSERT 0 1

--Check for TOAST
=> SELECT                                              
->   n.nspname as schema,                              
-> s.oid::regclass as relname,                     
-> s.reltoastrelid::regclass as toast_name,        
-> pg_relation_size(s.reltoastrelid) AS toast_size 
->   FROM                                              
-> pg_class s join pg_namespace n                  
-> on s.relnamespace = n.oid                       
->   WHERE                                             
-> relkind = 'r'                                   
-> AND reltoastrelid <> 0                          
-> AND n.nspname = 'public'                        
->   ORDER BY                                          
-> 3 DESC;
 schema | relname |        toast_name        | toast_size 
--------+---------+--------------------------+------------
 public | test1   | pg_toast.pg_toast_418714 |       8192
(1 row)

--Update via primary key, updating a TOAST column
=> update test1 set b='zxcv' where a=2;
UPDATE 1

=>  select * from pg_logical_slot_peek_changes('logical_dest',null,null);
     lsn      |    xid    |                                                                                                                                                                                      
--------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
 349/A851FD90 | 872483420 | BEGIN 872483420
 349/A85216E0 | 872483420 | table public.test1: INSERT: a[integer]:2 b[character varying]:'GIORCXQQWDBGTUNDZXAWMPYOUEGTECWTVQGDQGSPMEPJNPUQIFMESLRASBZWGONETRENDCHLDWVTDWJLTGRYUMFDOWHLEYLUTECPOVCYXFIATLKVEQTHSC'
 349/A85218A0 | 872483420 | COMMIT 872483420
 349/A8525CA8 | 872483429 | BEGIN 872483429
 349/A8525D50 | 872483429 | table public.test1: UPDATE: a[integer]:2 b[character varying]:'zxcv' c[character varying]:unchanged-toast-datum
 349/A8525DE0 | 872483429 | COMMIT 872483429

Column c, which has TOAST and was not involved in the update, has no decoded data — it directly outputs toast datum unchanged: unchanged-toast-datum.

Testing with wal2json:

=>  select pg_create_logical_replication_slot('logical_json','wal2json');
 pg_create_logical_replication_slot 
------------------------------------
 (logical_json,349/A87CAB58)
(1 row)

=>  update test1 set b='zxcv' where a=2;
UPDATE 1
=>  \pset format wrapped
Output format is wrapped.
=>  \pset columns 200
Target width is 200.
=>  select * from pg_logical_slot_peek_changes('logical_json',null,null);
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lsn  | 349/A87CACF8
xid  | 872483495
data | {"change":[{"kind":"update","schema":"public","table":"test1","columnnames":["a","b"],"columntypes":["integer","character varying(3000)"],"columnvalues":[2,"zxcv"],"oldkeys":{"keynames":["a"],.
     |."keytypes":["integer"],"keyvalues":[2]}}]}

=> update test1 set b='zxcv' where a=1;
UPDATE 1
Time: 1.391 ms
=>  select * from pg_logical_slot_peek_changes('logical_json',null,null);
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lsn  | 349/A87CACF8
xid  | 872483495
data | {"change":[{"kind":"update","schema":"public","table":"test1","columnnames":["a","b"],"columntypes":["integer","character varying(3000)"],"columnvalues":[2,"zxcv"],"oldkeys":{"keynames":["a"],.
     |."keytypes":["integer"],"keyvalues":[2]}}]}
-[ RECORD 2 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
lsn  | 349/A8CCA0D8
xid  | 872483509
data | {"change":[{"kind":"update","schema":"public","table":"test1","columnnames":["a","b","c"],"columntypes":["integer","character varying(3000)","character varying(3000)"],"columnvalues":[1,"zxcv".
     |.,"qwer"],"oldkeys":{"keynames":["a"],"keytypes":["integer"],"keyvalues":[1]}}]}
 --When updating, column c data is not decoded

wal2json shows the same behavior.

MySQL’s binlog_row_image parameter can adjust whether binlog records large fields:

  • full (Log all columns)
  • minimal (Log only changed columns, and columns needed to identify rows)
  • noblob (Log all columns, except for unneeded BLOB and TEXT columns)

PG has absolutely no such control — by default, TOAST columns are not decoded, and there are no other options to configure~

Related

Getting Started with HikariCP Connection Pool

·1889 words·9 mins
A Brief Introduction to HikariCP # “Hikari” means “light” in Japanese — HikariCP aims to be a Connection Pool as light and fast as light. This nearly Java-only middleware connection pool is extremely lightweight and performance-focused. HikariCP is now the default connection pool for Spring Boot, and with the proliferation of Spring Boot and microservices, HikariCP usage continues to grow. On the HikariCP GitHub homepage, there’s a performance comparison: (https://github.com/brettwooldridge/HikariCP-benchmark)

History of Transactions and SSI — PostgreSQL Database Technology Summit Chengdu Stop Sharing

·3384 words·16 mins
Preface # PostgreSQL Database Technology Summit Chengdu Stop # Recently (June 17, 2023), the “PostgreSQL Database Technology Summit Chengdu Stop” organized by the PostgreSQL branch of the China Open Source Software Promotion Alliance was successfully held. I had the honor of participating as a speaker and gained a lot from it. (Summit review and all PPT downloads: PPT downloads are here | PostgreSQL Technology Summit Chengdu Stop Review)

How to Solve Index Split Contention?

·1047 words·5 mins
Index Splitting # When an index block is nearly full, index splitting occurs. Index splitting comes in two forms: 55 and 91: The difference between index splitting and the enq: TX - index contention wait event Whether 55 or 91 splitting, both are normal index behavior as data volume increases. Index splitting is a normal phenomenon caused by growing data volume leading to larger indexes — when an index can’t hold more data, it naturally needs more index blocks. There are hardly any scenarios with tables but no indexes (only during initial data loading would one consider inserting data first and building indexes afterward). Although index splitting consumes some resources, in today’s Oracle environments it can complete quickly. Only when there are too many indexes does it affect insert efficiency.