<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>PostgreSQL内功修炼 on Last DBA</title><link>https://lastdba.com/en/categories/postgresql%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/</link><description>Recent content in PostgreSQL内功修炼 on Last DBA</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>© 2026 liuzhilong62</copyright><lastBuildDate>Fri, 29 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://lastdba.com/en/categories/postgresql%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/index.xml" rel="self" type="application/rss+xml"/><item><title>UUID v4 and v7: Collision Incidents and Performance Benchmarks</title><link>https://lastdba.com/en/2026/05/29/uuid-v4-and-v7-collision-incidents-and-performance-benchmarks/</link><pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/05/29/uuid-v4-and-v7-collision-incidents-and-performance-benchmarks/</guid><description>&lt;blockquote&gt;&lt;p&gt;Source material: &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;HN UUID v4 Collision Thread&lt;/a&gt;, &lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;dev.to UUID Benchmark&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 99%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 collided — someone on HackerNews actually hit a real collision. The root cause was a software stack bug, not math. v4 and v7 have no fundamental difference in collision safety. The real difference is index performance: v7 is time-ordered, B-tree is more compact, writes are 35% faster, indexes are 22% smaller. Your UUID v4 is probably fine, but if you care about index performance, switching to v7 is a cheap win.&lt;/p&gt;</description><content:encoded>&lt;blockquote&gt;&lt;p&gt;Source material: &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;HN UUID v4 Collision Thread&lt;/a&gt;, &lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;dev.to UUID Benchmark&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 99%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 collided — someone on HackerNews actually hit a real collision. The root cause was a software stack bug, not math. v4 and v7 have no fundamental difference in collision safety. The real difference is index performance: v7 is time-ordered, B-tree is more compact, writes are 35% faster, indexes are 22% smaller. Your UUID v4 is probably fine, but if you care about index performance, switching to v7 is a cheap win.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The UUID v4 Collision Incident
 &lt;div id="the-uuid-v4-collision-incident" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-uuid-v4-collision-incident" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A HackerNews thread blew up — &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;Ask HN: We just had an actual UUID v4 collision&amp;hellip;&lt;/a&gt;, 479 upvotes, 347 comments.&lt;/p&gt;
&lt;p&gt;The OP&amp;rsquo;s own words:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I know what you&amp;rsquo;re thinking&amp;hellip; and I still can&amp;rsquo;t believe it, but&amp;hellip; This morning, our database flagged a duplicate UUID (v4).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;It wasn&amp;rsquo;t a double-insert bug. The code didn&amp;rsquo;t write it twice. Only ~15,000 rows in the table, using npm&amp;rsquo;s &lt;code&gt;uuid&lt;/code&gt; package &lt;code&gt;uuidv4()&lt;/code&gt;, and two rows created at different times collided on the same UUID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s the probability of a UUID v4 collision? 122 random bits, 2^122 ≈ 5.3×10^36 possibilities. With 15,000 records, collision probability is roughly 2×10^-29. Theoretically &amp;ldquo;impossible.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But it happened.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 1: Unreliable entropy sources
 &lt;div id="cause-1-unreliable-entropy-sources" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-1-unreliable-entropy-sources" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;HN&amp;rsquo;s top-voted comment (jandrewrogers):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;UUIDv4 security depends on high-quality entropy sources. Hardware defects, software bugs, and misunderstandings of &amp;ldquo;high-quality entropy&amp;rdquo; all break this assumption. Detecting entropy source failures is expensive, so nobody checks — until a collision happens.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;UUID v4 is &lt;strong&gt;explicitly banned&lt;/strong&gt; in high-reliability systems because entropy source quality cannot be verified.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 2: Known npm uuid package bugs
 &lt;div id="cause-2-known-npm-uuid-package-bugs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-2-known-npm-uuid-package-bugs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The npm uuid package README itself warns:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;More seriously, its internal &lt;code&gt;rng()&lt;/code&gt; function has global mutable state. One commenter pointed out: calling &lt;code&gt;rng()&lt;/code&gt; and sending the result effectively &lt;strong&gt;overwrites someone else&amp;rsquo;s random number, and you can predict it&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Related commit: &lt;a href="https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd" target="_blank" rel="noreferrer"&gt;91805f665c&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Community advice: use Node.js built-in &lt;code&gt;crypto.randomUUID()&lt;/code&gt;, not the npm uuid package.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 3: Linux kernel /dev/random race condition
 &lt;div id="cause-3-linux-kernel-devrandom-race-condition" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-3-linux-kernel-devrandom-race-condition" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Another comment:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I encountered duplicate UUIDs during soak testing of a distributed system. After extensive debugging, I found it was a Linux kernel race condition bug — on multi-processor systems, two processes simultaneously reading /dev/random could, with extremely low probability (~one in a million), get the same bytes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;Cause 4: Go UUID library not checking return values
 &lt;div id="cause-4-go-uuid-library-not-checking-return-values" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-4-go-uuid-library-not-checking-return-values" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;blockquote&gt;&lt;p&gt;Early Go UUID libraries called random functions without checking the return value length. &amp;ldquo;Request N bytes, got 3 bytes back&amp;rdquo; never happened on most hardware, so nobody checked — until production, where it generated thousands of duplicate UUIDs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;Cause 5: Historical AMD CPU RNG defects
 &lt;div id="cause-5-historical-amd-cpu-rng-defects" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-5-historical-amd-cpu-rng-defects" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Certain AMD CPUs had built-in random number generator issues. VM environments can also &amp;ldquo;virtualize away&amp;rdquo; entropy — both time sources and entropy sources can degrade inside VMs.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;v4 and v7 have no fundamental difference in collision safety. The difference is in the first 48 bits — v4 is random, v7 is a timestamp. You&amp;rsquo;re unlikely to encounter timestamp source issues, and random source issues are equally rare. The HN thread is an interesting edge case. Knowing that a tiny number of people hit it is enough — you don&amp;rsquo;t need to distrust the UUID v4 in your own systems.&lt;/p&gt;
&lt;p&gt;When choosing v4 vs v7, what you should really look at isn&amp;rsquo;t collisions — it&amp;rsquo;s &lt;strong&gt;index performance&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;UUID v7 Performance Comparison in PG 16
 &lt;div id="uuid-v7-performance-comparison-in-pg-16" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#uuid-v7-performance-comparison-in-pg-16" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v7 has one concrete advantage over v4 in PostgreSQL: &lt;strong&gt;temporal clustering, more B-tree-friendly&lt;/strong&gt;. v4 can bloat and v7 can bloat too — the difference is simply that v7&amp;rsquo;s first 48 bits are time-ordered, so inserts concentrate on the right side of the B-tree, reducing page splits.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;Umang Sinha&amp;rsquo;s benchmark&lt;/a&gt; ran a rigorous comparison on a PG 16 Docker container (8 cores, 16GB, NVMe).&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test Conditions
 &lt;div id="test-conditions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-conditions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; uuid_v4_test (id UUID &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, payload TEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; uuid_v7_test (id UUID &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, payload TEXT);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Value&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Data volume&lt;/td&gt;
 &lt;td&gt;10 million rows per table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Batch size&lt;/td&gt;
 &lt;td&gt;10,000 rows per batch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Client&lt;/td&gt;
 &lt;td&gt;Go + pq driver&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UUID generation&lt;/td&gt;
 &lt;td&gt;Pre-generated in memory, not timed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Performance Results
 &lt;div id="performance-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#performance-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;UUID v4&lt;/th&gt;
 &lt;th&gt;UUID v7&lt;/th&gt;
 &lt;th&gt;Improvement&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Write 10M rows&lt;/td&gt;
 &lt;td&gt;5 min 35 sec&lt;/td&gt;
 &lt;td&gt;3 min 38 sec&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;35% faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Table + index total size&lt;/td&gt;
 &lt;td&gt;3618 MB&lt;/td&gt;
 &lt;td&gt;3443 MB&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;5% smaller&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;B-tree index size&lt;/td&gt;
 &lt;td&gt;776 MB&lt;/td&gt;
 &lt;td&gt;602 MB&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;22% smaller&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Point lookup&lt;/td&gt;
 &lt;td&gt;0.167 ms&lt;/td&gt;
 &lt;td&gt;0.038 ms&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;4.4x faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Range scan&lt;/td&gt;
 &lt;td&gt;8.283 ms&lt;/td&gt;
 &lt;td&gt;3.791 ms&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;2.2x faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Why Such a Big Difference
 &lt;div id="why-such-a-big-difference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-such-a-big-difference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/uuid-v4-structure.png" alt="UUID v4 bit structure" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/uuid-v7-structure.png" alt="UUID v7 bit structure" /&gt;&lt;/p&gt;
&lt;p&gt;UUID v4 is fully random. Newly inserted UUIDs scatter randomly across the B-tree index, causing massive page splits and severe index fragmentation. UUID v7 has a millisecond-precision timestamp in the first 48 bits, so newly generated UUIDs are naturally ordered — writes cluster on the right side of the B-tree, page splits drop dramatically, and the index is much more compact.&lt;/p&gt;
&lt;p&gt;The 22% smaller index isn&amp;rsquo;t magic — it&amp;rsquo;s &lt;strong&gt;reduced fragmentation&lt;/strong&gt;. Point lookups being 4x faster isn&amp;rsquo;t surprising either — fewer B-tree levels, higher cache hit rates.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 and v7 are identical in collision safety — both depend on entropy source quality, one fills the first 48 bits with random numbers, the other with a timestamp. Collisions are edge cases that a tiny number of people hit in specific environments. Your environment is probably fine — that basic judgment doesn&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;What you really should think about is &lt;strong&gt;index performance&lt;/strong&gt;. v7&amp;rsquo;s temporal property makes B-trees more compact, with measured results of 35% faster writes, 22% smaller indexes, and 2-4x faster queries. If your system writes UUIDs at high volume, switching to v7 saves meaningful storage and CPU.&lt;/p&gt;
&lt;p&gt;PG 18 will natively support &lt;code&gt;gen_uuid_v7()&lt;/code&gt;. For now, generate UUIDs at the application layer. Whichever version you use, always add a UNIQUE constraint.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>From collation mismatch Exception to Its Principles</title><link>https://lastdba.com/en/2025/12/13/from-collation-mismatch-exception-to-its-principles/</link><pubDate>Sat, 13 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/13/from-collation-mismatch-exception-to-its-principles/</guid><description>&lt;h2 class="relative group"&gt;Problem Phenomenon
 &lt;div id="problem-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After physical migration to Xinchuang, occasional errors appear in the pg log, version pg15:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: 01000: collation &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has version mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The collation in the database was created using version 2.17, but the operating system provides version 2.28.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild all objects affected by this collation and run ALTER COLLATION pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH VERSION, or build RaseSQL with the right library version.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: pg_newlocale_from_collation, pg_locale.c:1660&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Context: During the physical switch, invalid index rebuilding and refresh database collation version were performed.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Phenomenon
 &lt;div id="problem-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After physical migration to Xinchuang, occasional errors appear in the pg log, version pg15:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: 01000: collation &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has version mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The collation in the database was created using version 2.17, but the operating system provides version 2.28.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild all objects affected by this collation and run ALTER COLLATION pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH VERSION, or build RaseSQL with the right library version.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: pg_newlocale_from_collation, pg_locale.c:1660&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Context: During the physical switch, invalid index rebuilding and refresh database collation version were performed.&lt;/p&gt;
&lt;p&gt;Although the libc version was upgraded after physical migration, indexes were rebuilt and are now valid, and the collation version in the database is already consistent with the OS libc.&lt;/p&gt;
&lt;p&gt;So,&lt;/p&gt;
&lt;p&gt;Why is the error reported?&lt;/p&gt;
&lt;p&gt;Where is the error triggered?&lt;/p&gt;
&lt;p&gt;What is the impact of the error?&lt;/p&gt;
&lt;p&gt;How to resolve it?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Why is the error reported?
 &lt;div id="why-is-the-error-reported" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-the-error-reported" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The collation inside the database mainly involves 3 aspects: database, columns, and indexes. The first two use default collation, and the index collation is the real collation.&lt;/p&gt;
&lt;p&gt;First, check the database collation. All databases use en_US.UTF8, and refresh database collation has already been done, so the &amp;ldquo;collation &amp;quot;zh_CN.utf8&amp;quot; has version mismatch&amp;rdquo; error should not be thrown at the database layer.&lt;/p&gt;
&lt;p&gt;Then check columns without specially specified default collation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; attrelid,attname,attcollation &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; attcollation &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;951&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; attrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attcollation 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;0 means no collation, default oid=100, C oid=950, POSIX oid=951; &amp;ldquo;zh_CN.utf8&amp;rdquo; definitely won&amp;rsquo;t be any of these four.&lt;/p&gt;
&lt;p&gt;Finally, check indexes without specially specified collation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; indexrelid ,&lt;span style="color:#66d9ef"&gt;unnest&lt;/span&gt;(indcollation) coll &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index) i &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; coll &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;951&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; indexrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; coll 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Having ruled out database, columns, and indexes, only one situation remains: the application layer specifies a sort rule:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build RaseSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1660&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This zh_CN.utf8 version is inconsistent with the actual one:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Not only zh_CN.utf8 is different, all are different (except a few collations without version concept).&lt;/p&gt;
&lt;p&gt;So it&amp;rsquo;s very likely that the application itself specified a sort rule &amp;ldquo;zh_CN.utf8&amp;rdquo;, but the coll version in the database is inconsistent with the OS, which triggered the error.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Understanding
 &lt;div id="source-code-understanding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-understanding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The error message makes it easy to locate the source code position. Two main functions are of interest: &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; and &lt;code&gt;CheckMyDatabase&lt;/code&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; Caching and Checking &lt;code&gt;pg_collation&lt;/code&gt;
 &lt;div id="pg_newlocale_from_collation-caching-and-checking-pg_collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_newlocale_from_collation-caching-and-checking-pg_collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; was introduced in pg10.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Create a locale_t from a collation OID. Results are cached for the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * lifetime of the backend. Thus, do not free the result with freelocale().
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * As a special optimization, the default/database collation returns 0.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Callers should then revert to the non-locale_t-enabled code path.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In fact, they shouldn&amp;#39;t call this function at all when they are dealing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * with the default locale. That can save quite a bit in hotspots.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Also, callers should avoid calling this before going down a C/POSIX
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * fastpath, because such a fastpath should work even on platforms without
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * locale_t support in the C library.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * For simplicity, we always generate COLLATE + CTYPE even though we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * might only need one of them. Since this is called only once per session,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * it shouldn&amp;#39;t cost much.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* locale_t means non-ICU. This function caches a locale_t type collation OID for the backend
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;* the default/database collation returns 0. &amp;#34;default&amp;#34; means using the database&amp;#39;s collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pg_newlocale_from_collation&lt;/span&gt;(Oid collid) &lt;span style="color:#75715e"&gt;// Note: passes in collation oid, not fetching all pg_collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Return 0 for &amp;#34;default&amp;#34; collation, just in case caller forgets */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (collid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; DEFAULT_COLLATION_OID) &lt;span style="color:#75715e"&gt;// Three special collations:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; &lt;span style="color:#75715e"&gt;// default oid=100, C oid=950, POSIX oid=951
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (cache_entry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;locale &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		collversion &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SysCacheGetAttr&lt;/span&gt;(COLLOID, tp, Anum_pg_collation_collversion,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;isnull); &lt;span style="color:#75715e"&gt;// Get version from pg_collation data dictionary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;isnull)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			actual_versionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_collation_actual_version&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collprovider, collcollate); &lt;span style="color:#75715e"&gt;// Get actual version via get_collation_actual_version
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			collversionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TextDatumGetCString&lt;/span&gt;(collversion);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(actual_versionstr, collversionstr) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// Compare data dictionary version and actual version, throw error if different
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;collation &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has version mismatch&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collname)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errdetail&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;The collation in the database was created using version %s, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;but the operating system provides version %s.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 collversionstr, actual_versionstr),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Rebuild all objects affected by this collation and run &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;ALTER COLLATION %s REFRESH VERSION, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;or build PostgreSQL with the right library version.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;quote_qualified_identifier&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;get_namespace_name&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collnamespace),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;															&lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collname)))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; cache_entry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;locale;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main check is: through the coll oid, check whether the version in the pg_collation data dictionary is consistent with the actual version; if inconsistent, throw an error.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;code&gt;CheckMyDatabase&lt;/code&gt; Caching and Checking &lt;code&gt;pg_database&lt;/code&gt;
 &lt;div id="checkmydatabase-caching-and-checking-pg_database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkmydatabase-caching-and-checking-pg_database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;CheckMyDatabase&lt;/code&gt; has existed for a long time, performing many database-side checks. However, pg15 added logic for checking the database version.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CheckMyDatabase -- fetch information from the pg_database entry for our DB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckMyDatabase&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; am_superuser, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; override_allow_connections)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Fetch our pg_database row normally, via syscache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	tup &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SearchSysCache1&lt;/span&gt;(DATABASEOID, &lt;span style="color:#a6e22e"&gt;ObjectIdGetDatum&lt;/span&gt;(MyDatabaseId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	default_locale.provider &lt;span style="color:#f92672"&gt;=&lt;/span&gt; dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider; &lt;span style="color:#75715e"&gt;// default is the db&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Default locale is currently always deterministic. Nondeterministic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * locales currently don&amp;#39;t support pattern matching, which would break a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * lot of things if applied globally.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	default_locale.deterministic &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// byte-order sensitive
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Check collation version. See similar code in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * pg_newlocale_from_collation(). Note that here we warn instead of error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * in any case, so that we don&amp;#39;t prevent connecting.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	datum &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SysCacheGetAttr&lt;/span&gt;(DATABASEOID, tup, Anum_pg_database_datcollversion,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;isnull); &lt;span style="color:#75715e"&gt;// Get datcollversion from pg_database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;isnull)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;actual_versionstr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;collversionstr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		collversionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TextDatumGetCString&lt;/span&gt;(datum);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		actual_versionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_collation_actual_version&lt;/span&gt;(dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider, dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider &lt;span style="color:#f92672"&gt;==&lt;/span&gt; COLLPROVIDER_ICU &lt;span style="color:#f92672"&gt;?&lt;/span&gt; iculocale : collate); &lt;span style="color:#75715e"&gt;// Get actual version via get_collation_actual_version
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(actual_versionstr, collversionstr) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// Compare db datcollversion and actual version, throw warning if not equal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has a collation version mismatch&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errdetail&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;The database was created using collation version %s, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;but the operating system provides version %s.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 collversionstr, actual_versionstr),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Rebuild all objects in this database that use the default collation and run &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;ALTER DATABASE %s REFRESH COLLATION VERSION, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;or build PostgreSQL with the right library version.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#a6e22e"&gt;quote_identifier&lt;/span&gt;(name))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;CheckMyDatabase&lt;/code&gt; function compares the datcollversion in the pg_database data dictionary with the actual version.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Function Differences
 &lt;div id="function-differences" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#function-differences" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;In pg14 and before, there was only 1 collation comparison logic: when a session first caches the corresponding collation, it calls &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; to access &lt;strong&gt;the version of the corresponding collation in the pg_collation data dictionary&lt;/strong&gt; and compare it with the real version.&lt;/li&gt;
&lt;li&gt;In PG15 and later, because the datcollversion field was added to the pg_database table, a new logic for checking db collation version was added: when a session first accesses the db in pg_database, it calls &lt;code&gt;CheckMyDatabase&lt;/code&gt; to check &lt;strong&gt;the datcollversion of the corresponding database in pg_database&lt;/strong&gt; and compare it with the real version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Why Are There Fewer Errors After Only Refreshing the Database?
 &lt;div id="why-are-there-fewer-errors-after-only-refreshing-the-database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-are-there-fewer-errors-after-only-refreshing-the-database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After refreshing the database collation version, the warning about inconsistent pg_database coll version won&amp;rsquo;t be triggered, but it still cannot rule out the situation where pg_collation&amp;rsquo;s coll version is inconsistent. Why are there so many fewer errors after only refreshing the database? Could it be that pg_collation&amp;rsquo;s coll version simply won&amp;rsquo;t be loaded?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.coll,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unnest&lt;/span&gt;(indcollation) coll &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index ) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.coll;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; coll &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; &lt;span style="color:#75715e"&gt;--C
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2841&lt;/span&gt; &lt;span style="color:#75715e"&gt;--No collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;723&lt;/span&gt; &lt;span style="color:#75715e"&gt;--default&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In real environments, default is the most used. Generally, no one specifies a collation; if not specified it&amp;rsquo;s default, and default is the database&amp;rsquo;s default collation.&lt;/p&gt;
&lt;p&gt;Here we need to revisit the &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; function. The function starts like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pg_newlocale_from_collation&lt;/span&gt;(Oid collid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	collation_cache_entry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cache_entry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Callers must pass a valid OID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;OidIsValid&lt;/span&gt;(collid));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Return 0 for &amp;#34;default&amp;#34; collation, just in case caller forgets */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (collid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; DEFAULT_COLLATION_OID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;collid==DEFAULT_COLLATION_OID&lt;/code&gt;==100, it directly &lt;code&gt;return&lt;/code&gt;s without executing the real version check below, so it won&amp;rsquo;t throw a warning. This logic is reasonable because the db coll version has already been verified when logging into the database; if there&amp;rsquo;s a problem, a warning must have already been thrown at the session layer.&lt;/p&gt;
&lt;p&gt;Furthermore, even if a possible value like collid=37 is passed in, the corresponding C also has no version concept.&lt;/p&gt;
&lt;p&gt;Therefore, after refreshing the database, in the vast majority of scenarios, as long as the database&amp;rsquo;s internal sorting is used (not expression sorting or specified index sorting), no error will be thrown.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Here we only test whether there is a refresh warning, not testing index corruption or database crashes.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Check libc version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;getconf GNU_LIBC_VERSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Source host version glibc 2.17
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Target host glibc 2.28
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg version pg15+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, only db coll version changes
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-only-db-coll-version-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-only-db-coll-version-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; datname,datlocprovider,datcollate,datctype,datcollversion &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datlocprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datctype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollversion 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------------+-------------+-------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collprovider,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;en_US.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; lzldb refresh &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: changing &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: AlterDatabaseRefreshColl, dbcommands.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2399&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATABASE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Check pg_collation and pg_database again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datlocprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datctype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollversion 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------------+-------------+-------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Consistent with the official documentation description: refresh database collation version only refreshes the db&amp;rsquo;s default collation; pg_collation itself won&amp;rsquo;t change.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, specifying expression sort reports warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-specifying-expression-sort-reports-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-specifying-expression-sort-reports-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;As analyzed at the beginning, expression sorting will report a warning, omitted.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, creating a new index with specified collation reports warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-creating-a-new-index-with-specified-collation-reports-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-creating-a-new-index-with-specified-collation-reports-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Test 1: Specify collation when creating index&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx11 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt(a &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build PostgreSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test 2: Specify column default collation when creating table, don&amp;rsquo;t specify when creating index&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb &lt;span style="color:#75715e"&gt;-- Reconnect a session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; ttt(a varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxttt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ttt(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build PostgreSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;904&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Column default collation and index specification of collation are essentially the same thing, both for specifying the index&amp;rsquo;s collation. Both can report warnings.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, existing index with specified collation does not report warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-existing-index-with-specified-collation-does-not-report-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-existing-index-with-specified-collation-does-not-report-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Scenario: The original database already has an index specifying collation zh_CN.utf8, different from the db. Refreshing the db won&amp;rsquo;t catch it. But after migrating to a new database, the vendor&amp;rsquo;s coll version definitely changed.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collprovider,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without using expression sorting, the index can be used, but index sorting cannot be used:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_seqscan &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANALYZE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6667&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6670&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;928&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6667&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6892&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;81&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;926&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Method&lt;/span&gt;: top&lt;span style="color:#f92672"&gt;-&lt;/span&gt;N heapsort Memory: &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxtt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1732&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;434&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Fetches: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Existing indexes with specified collation do not report warnings when used.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary of This Problem
 &lt;div id="summary-of-this-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary-of-this-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The refresh database and refresh collation warnings are session-level. In each session, for each database or each collation, it only reports once.&lt;/p&gt;
&lt;p&gt;Only refreshing the database very likely won&amp;rsquo;t report warnings again, but there are situations where creating an index with a specified collation or running SQL with specified expression collation may still report warnings.&lt;/p&gt;
&lt;p&gt;The coll version in the data dictionary is only for tracking whether the collation provider version has changed at the database layer. Imagine if there were no coll version in the data dictionary - the database might not even be able to return a warning saying &amp;ldquo;your sort rule provider has upgraded its version, your data sorting might have problems, you need to check it&amp;rdquo; (and of course it&amp;rsquo;s not just about sorting).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Solutions for This Problem
 &lt;div id="solutions-for-this-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solutions-for-this-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Corrupt indexes have already been rebuilt, the database has been refreshed, only collation hasn&amp;rsquo;t been refreshed. The inconsistency of coll version in the data dictionary is not a big problem, it&amp;rsquo;s just a warning. As for other hidden and strange pitfalls, refer to the more section.&lt;/p&gt;
&lt;p&gt;Solution for this problem:&lt;/p&gt;
&lt;p&gt;Step 1: Check if there are still dependencies&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_describe_object(refclassid, refobjid, refobjsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Collation&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_describe_object(classid, objid, objsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Object&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_depend d &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; refclassid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;pg_collation&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; refobjid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collversion &lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt; pg_collation_actual_version(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If there are returns, it&amp;rsquo;s best to rebuild the dependent objects; if not, follow step 2:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Solution 1: Do nothing. If there aren&amp;rsquo;t many warnings, leaving them alone is fine.&lt;/li&gt;
&lt;li&gt;Solution 2: Only refresh collation zh_CN.UTF8. Fix one as it comes.&lt;/li&gt;
&lt;li&gt;Solution 3: Refresh all collations. Even if the application incrementally uses expressions or index-specified collation, no warnings will be reported.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;More
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Key Summary of glibc Upgrade Related Issues
 &lt;div id="key-summary-of-glibc-upgrade-related-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#key-summary-of-glibc-upgrade-related-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Locale is a very tricky area, and glibc upgrades cause many collation-related problems. Referencing reference materials, here&amp;rsquo;s a summary of some important points:&lt;/p&gt;
&lt;p&gt;pg_collation is obtained from the OS command &lt;code&gt;locale -a&lt;/code&gt;; the provider is basically glibc, so you need to look at the glibc version.&lt;/p&gt;
&lt;p&gt;In pg_collation, &amp;ldquo;C&amp;rdquo; and &amp;ldquo;posix&amp;rdquo; have collprovider &lt;code&gt;c&lt;/code&gt;, which looks the same as &amp;ldquo;C.UTF8&amp;rdquo; etc., but they&amp;rsquo;re not. &amp;ldquo;C.UTF8&amp;rdquo;&amp;rsquo;s provider is glibc, &lt;strong&gt;has a version, generally Unicode codepoint sorting or Unicode semantic sorting&lt;/strong&gt;; &amp;ldquo;C&amp;rdquo; and &amp;ldquo;POSIX&amp;rdquo; are equivalent, the most basic locale defined by the POSIX standard, implemented by libc, not in &lt;code&gt;locale -a&lt;/code&gt;, &lt;strong&gt;has no version, sorts directly by byte order&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Root cause of collation problems: The database requires that locale definitions never change during the database lifecycle, but OS vendors, especially the GNU C library, make changes to locale in every minor version, and this is legitimate.&lt;/p&gt;
&lt;p&gt;GNU C library makes changes to locale in every minor version. The version most prone to problems in reality is &lt;strong&gt;glibc 2.28&lt;/strong&gt;, because 2.28 upgraded the major version &lt;strong&gt;unicode 9.0.0&lt;/strong&gt; (&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;has been updated to a new upstream version from ISO which is in sync with Unicode 9.0.0&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg has no way to detect compatibility issues caused by glibc upgrades&lt;/strong&gt;. Index corruption checking is not an all-check, and indexes are only one aspect. After physical replication or upgrade, even if indexes are rebuilt, you cannot rule out the possibility that the database crashes one day due to collation version issues.&lt;/p&gt;
&lt;p&gt;Data anomalies include: duplicate primary keys, sort-dependent constraints, range partition table data written to wrong partitions, mergejoin and other sort operations, etc.&lt;/p&gt;
&lt;p&gt;Character types depend on collation. Data types that don&amp;rsquo;t depend on collation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;bytea&lt;/li&gt;
&lt;li&gt;tsvector gin indexes&lt;/li&gt;
&lt;li&gt;pg_trgm indexes&lt;/li&gt;
&lt;li&gt;numeric data types: int, bigint, numeric, float, &amp;hellip;&lt;/li&gt;
&lt;li&gt;custom data types like geometry (PostGIS)&lt;/li&gt;
&lt;li&gt;timestamp&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ASCII sorting is relatively common but doesn&amp;rsquo;t conform to human understanding, i.e., not semantic. Semantically conforming international sorting standards are generally Unicode standards.&lt;/p&gt;
&lt;p&gt;Unicode-based sorting rules are divided into 2 types: codepoint sorting, UCA (Unicode Collation Algorithm).&lt;/p&gt;
&lt;p&gt;UCA is based on DUCET (Default Unicode Collation Element Table). The DUCET table itself may have sorting changes between different versions. For example, en_US.UTF8 is UCA sorting, equivalent to semantic sorting; version upgrades will change sorting rules. C.UTF8 is codepoint sorting; once codepoints are confirmed they won&amp;rsquo;t change, and sorting rules won&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;PG 17+ provides a very safe locale provider method: builtin, no longer depending on OS-provided glibc, ICU and other providers. Example enable command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb &lt;span style="color:#75715e"&gt;--locale-provider=builtin --bultin-locale=C.UTF-8 dbname1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17 only supports C, C.UTF-8. C is byte-order sorting (approximately ASCII sorting), C.UTF-8 is Unicode codepoint sorting; 18 adds one more PG_UNICODE_FAST, also Unicode codepoint sorting, with &lt;a href="https://www.postgresql.org/docs/18/locale.html#LOCALE-PROVIDERS" target="_blank" rel="noreferrer"&gt;slight differences&lt;/a&gt; from C.UTF-8.&lt;/p&gt;
&lt;p&gt;Because the database must maintain stable sorting, custom application sorting can only be pushed to the application layer. For example, expression sorting is semantically clear and doesn&amp;rsquo;t affect the database&amp;rsquo;s own choice of collation. If one day pg also supports built-in en_US.utf8, then we can consider built-in semantic sorting.&lt;/p&gt;
&lt;p&gt;During Xinchuang migration, the glibc version of Xinchuang hosts is generally higher than old Intel server glibc versions, likely crossing the 2.28 version. Combined with tight deadlines, KPI pressure, insufficient manpower, and large databases, physical migration is unavoidable. So Xinchuang physical migration needs to pay attention to glibc versions and many anomalies caused by collation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What to Do After Physical Migration
 &lt;div id="what-to-do-after-physical-migration" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-to-do-after-physical-migration" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Assuming the database is en_US.utf8, provider c, and physical migration across libc versions has already been done, the following operations should be performed:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Official Required Solution&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;At minimum, rebuild problematic indexes. Install the amcheck extension and use the bt_index_check function:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; bt_index_check(&lt;span style="color:#e6db74"&gt;&amp;#39;idx1&amp;#39;&lt;/span&gt;::regclass, &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Refresh database version (pg15+):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATABASE&lt;/span&gt; name REFRESH &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Check if there are other &lt;a href="https://www.postgresql.org/docs/18/sql-altercollation.html#SQL-ALTERCOLLATION-NOTES" target="_blank" rel="noreferrer"&gt;dependent objects&lt;/a&gt;. If there are, handle them accordingly:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_describe_object(refclassid, refobjid, refobjsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Collation&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_describe_object(classid, objid, objsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Object&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_depend d &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; refclassid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;pg_collation&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; refobjid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collversion &lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt; pg_collation_actual_version(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After handling, then:&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Refresh collation version (pg10+):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; name REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;II. Unofficial Workaround Solutions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t made a complete solution here, just some thoughts.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Handling partition table data written to wrong partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Partition key is int/bigint/float, no relation to collation, can be ignored.&lt;/p&gt;
&lt;p&gt;Partition key is time partition, if timestamp, can be ignored. If varchar or other character types, depends on the situation.&lt;/p&gt;
&lt;p&gt;Partition key is character type, refer to &amp;ldquo;a&amp;rdquo; and &amp;ldquo;-&amp;rdquo; sorting (pgconf Collation Challenges Sorting It Out). But note the following points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If querying data, don&amp;rsquo;t query from the parent table; it might crash or fail to return results.&lt;/li&gt;
&lt;li&gt;There&amp;rsquo;s no simple detection solution.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;Handling primary key/unique key conflicts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handling fdw sort range anomaly issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unknown problems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;ref
 &lt;div id="ref" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ref" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Locale_data_changes" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Locale_data_changes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Collations" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Collations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;pgconf Collation Challenges Sorting It Out&lt;/p&gt;
&lt;p&gt;PFCONF Collations from A to Z&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.unicode.org/reports/tr10/tr10-34.html" target="_blank" rel="noreferrer"&gt;http://www.unicode.org/reports/tr10/tr10-34.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;https://sourceware.org/glibc/wiki/Release/2.28&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/18/sql-altercollation.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/18/sql-altercollation.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/18/sql-alterdatabase.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/18/sql-alterdatabase.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/locale.html#LOCALE-PROVIDERS" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/locale.html#LOCALE-PROVIDERS&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;</content:encoded></item><item><title>A Brief Review of Logical Replication in Oracle, MySQL, and PostgreSQL</title><link>https://lastdba.com/en/2025/11/30/a-brief-review-of-logical-replication-in-oracle-mysql-and-postgresql/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/11/30/a-brief-review-of-logical-replication-in-oracle-mysql-and-postgresql/</guid><description>&lt;h3 class="relative group"&gt;PostgreSQL Logical Replication
 &lt;div id="postgresql-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;​​​​


&lt;img src="https://lastdba.com/img/csdn/64e1d30f2123.png" alt="在这里插入图片描述" /&gt;
（https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf）&lt;/p&gt;
&lt;p&gt;PostgreSQL places all logical decoding related matters entirely within the database&amp;rsquo;s replication slots for management — an all-inclusive approach. Early versions had somewhat limited logical replication support, but in recent major versions, logical replication has been one of the primary functional improvements.&lt;/p&gt;
&lt;p&gt;Advantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very flexible: it exposes the logical decoding interface to users, with multiple types of decoding methods available.&lt;/li&gt;
&lt;li&gt;Users can subscribe to only the data they need based on their requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the PG approach:&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;PostgreSQL Logical Replication
 &lt;div id="postgresql-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;​​​​


&lt;img src="https://lastdba.com/img/csdn/64e1d30f2123.png" alt="在这里插入图片描述" /&gt;
（https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf）&lt;/p&gt;
&lt;p&gt;PostgreSQL places all logical decoding related matters entirely within the database&amp;rsquo;s replication slots for management — an all-inclusive approach. Early versions had somewhat limited logical replication support, but in recent major versions, logical replication has been one of the primary functional improvements.&lt;/p&gt;
&lt;p&gt;Advantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very flexible: it exposes the logical decoding interface to users, with multiple types of decoding methods available.&lt;/li&gt;
&lt;li&gt;Users can subscribe to only the data they need based on their requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The number of concepts to learn and the learning cost are relatively higher compared to MySQL. Just the basic concepts — publication, subscription, walsender, replication slots, output plugins, etc. — I believe many people haven&amp;rsquo;t fully grasped their definitions and relationships.&lt;/li&gt;
&lt;li&gt;Does the hardest work and takes the hardest hits. All logical decoding problems are exposed within the database: WAL backlog, large transactions, long transactions, reorder transaction sorting, privilege issues, streaming transmission — these are all problems PG has to deal with.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;MySQL&amp;rsquo;s binlog
 &lt;div id="mysqls-binlog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mysqls-binlog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/668c1dc8ce20.png" alt="在这里插入图片描述" /&gt;
(&lt;a href="https://blog.fasterinfo.top/6243.html" target="_blank" rel="noreferrer"&gt;https://blog.fasterinfo.top/6243.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;MySQL places all decoded logical data locally — in binlog files. The approach is simple. &lt;em&gt;MySQL&amp;rsquo;s binlog is roughly equivalent to PostgreSQL with full-table logical replication enabled and written locally.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Advantages of the MySQL approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simple and straightforward: MySQL doesn&amp;rsquo;t expose the logical decoding interface directly to users. Instead, it provides already-decoded files directly to users, who don&amp;rsquo;t need to care about how parsing works — just read the binlog files.&lt;/li&gt;
&lt;li&gt;Mature ecosystem. I personally believe MySQL&amp;rsquo;s mature ecosystem is closely tied to binlog. During the internet era, PG&amp;rsquo;s logical replication was still weak, while binlog was extremely simple. Downstream parsing of binlog to put data onto other platforms became a common pattern.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the MySQL approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All data must be decoded; no customizable subscription. Poor flexibility.&lt;/li&gt;
&lt;li&gt;Two-phase commit. Because MySQL&amp;rsquo;s primary-standby replication heavily depends on binlog, binlog data must be fully flushed to binlog files at commit time. A single commit must write two (or two kinds of) logs — binlog and redolog. Dual log writes are one of MySQL&amp;rsquo;s eternal pain points.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Oracle Logical Replication
 &lt;div id="oracle-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8978c46a1452.png" alt="在这里插入图片描述" /&gt;
（https://www.oracle-scn.com/oracle-goldengate-integrated-capture/）&lt;/p&gt;
&lt;p&gt;Oracle itself does have logical Data Guard functionality, but virtually no one uses it. Here we&amp;rsquo;ll only discuss LogMiner. The Oracle database itself provides an interface like LogMiner for parsing logs (e.g., OGG integrated capture mode), but has zero replication link management itself — it relies on third-party tools to create and manage replication links.&lt;/p&gt;
&lt;p&gt;Advantages of the Oracle approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only provides a parsing interface, no replication link management. For the database itself, this is very hassle-free.&lt;/li&gt;
&lt;li&gt;Pay and you get a solution. Just buy the powerful OGG directly. Don&amp;rsquo;t say Oracle hasn&amp;rsquo;t provided a logical replication solution — we not only have one, it&amp;rsquo;s powerful and highly recognized.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the Oracle approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Relies on third-party software to manage replication links.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, PG&amp;rsquo;s logical replication is an all-in-one, do-everything approach — very much in the open-source, technical spirit. MySQL&amp;rsquo;s approach is simple, crude, but effective — somewhat &amp;ldquo;one-step-to-finish.&amp;rdquo; Oracle&amp;rsquo;s approach is: provide an interface and leave everything else to third parties, but from the customer&amp;rsquo;s perspective, there is a mature solution available.&lt;/p&gt;</content:encoded></item><item><title>Query Conflicts: From a Static Table Conflict to Its Root Cause</title><link>https://lastdba.com/en/2025/09/13/query-conflicts-from-a-static-table-conflict-to-its-root-cause/</link><pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/09/13/query-conflicts-from-a-static-table-conflict-to-its-root-cause/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Symptom
 &lt;div id="the-symptom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-symptom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A static historical table with no updates whatsoever — yet queries on the same-city standby consistently hit query conflicts:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;40001&lt;/span&gt;: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; conflict &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; recovery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#66d9ef"&gt;User&lt;/span&gt; query might have needed &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; see &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions that must be removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ProcessInterrupts, postgres.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3197&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;30534&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;973&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;535&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Why a Query Conflict on a Static Table Matters
 &lt;div id="why-a-query-conflict-on-a-static-table-matters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-a-query-conflict-on-a-static-table-matters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;My understanding was that a static table should never experience conflicts (this understanding was wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Symptom
 &lt;div id="the-symptom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-symptom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A static historical table with no updates whatsoever — yet queries on the same-city standby consistently hit query conflicts:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;40001&lt;/span&gt;: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; conflict &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; recovery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#66d9ef"&gt;User&lt;/span&gt; query might have needed &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; see &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions that must be removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ProcessInterrupts, postgres.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3197&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;30534&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;973&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;535&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Why a Query Conflict on a Static Table Matters
 &lt;div id="why-a-query-conflict-on-a-static-table-matters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-a-query-conflict-on-a-static-table-matters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;My understanding was that a static table should never experience conflicts (this understanding was wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;
&lt;p&gt;The official documentation lists the conflict cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Access Exclusive locks taken on the primary server, including both explicit &lt;code&gt;LOCK&lt;/code&gt; commands and various DDL actions, conflict with table accesses in standby queries.&lt;/li&gt;
&lt;li&gt;Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.&lt;/li&gt;
&lt;li&gt;Dropping a database on the primary conflicts with sessions connected to that database on the standby.&lt;/li&gt;
&lt;li&gt;Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still &amp;ldquo;see&amp;rdquo; any of the rows to be removed.&lt;/li&gt;
&lt;li&gt;Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;LOCK, DDL, drop tablespace, drop database — definitely none of those.&lt;/p&gt;
&lt;p&gt;Vacuum — none either, confirmed by &lt;code&gt;pg_stat_all_tables.last_autovacuum&lt;/code&gt; and WAL vacuum records.&lt;/p&gt;
&lt;p&gt;The official documentation&amp;rsquo;s explanation stops there. I carefully verified that none of the above applied.&lt;/p&gt;
&lt;p&gt;Extrapolating from existing knowledge, &lt;em&gt;perhaps&lt;/em&gt; other scenarios could kill the xmin held by a standby query&amp;rsquo;s snapshot. For example, in-page pruning removes xmin from rows on a page — if the standby query&amp;rsquo;s snapshot still depends on those xmins, theoretically a conflict could occur. But a page belongs to a specific table, and querying only one table holds only snapshots and xmins on that table. So, &lt;em&gt;theoretically&lt;/em&gt;, in-page pruning on table A &lt;strong&gt;should&lt;/strong&gt; not cause a query conflict on table B (this understanding was also wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s official documentation on query conflict scenarios is fairly vague and doesn&amp;rsquo;t explain well why a static table can experience conflicts. Even combining it with my own extrapolations, there shouldn&amp;rsquo;t be a conflict. But I noticed this pattern seemed to exist on many instances, so it was worth investigating.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause Analysis
 &lt;div id="root-cause-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since the startup process kills the query, checking the startup process&amp;rsquo;s pstack should reveal the conflict function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;212012&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002b283f63d783 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; __select_nocancel () &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;lib64&lt;span style="color:#f92672"&gt;/&lt;/span&gt;libc.so.&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000008fcf5a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pg_usleep (microsec&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgsleep.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000787905 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; WaitExceedsMaxStandbyDelay (wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;208&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; ResolveRecoveryConflictWithVirtualXIDs (waitlist&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2398a50, reason&lt;span style="color:#f92672"&gt;=&lt;/span&gt;reason&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;wait_event_info&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;, report_waiting&lt;span style="color:#f92672"&gt;=&lt;/span&gt;report_waiting&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000787b33 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ResolveRecoveryConflictWithVirtualXIDs (report_waiting&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;, wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;, reason&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;333&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; ResolveRecoveryConflictWithSnapshot (latestRemovedXid&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;...) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;329&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000004c8ffe &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; heap_xlog_clean (record&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2366978) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7764&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; heap2_redo (record&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2366978) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8917&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000519e55 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartupXLOG () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; xlog.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7411&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072f211 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartupProcessMain () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; startup.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000005286b1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; AuxiliaryProcessMain (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argv&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffeb7e39d70) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; bootstrap.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072c369 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartChildProcess (&lt;span style="color:#66d9ef"&gt;type&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;StartupProcess) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; postmaster.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5494&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072eb54 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; PostmasterMain (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argv&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x232edb0) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; postmaster.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1407&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000004892cf &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; main (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x232edb0) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; main.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;210&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;XLOG_HEAP2_CLEAN
 &lt;div id="xlog_heap2_clean" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xlog_heap2_clean" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;heap2_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		info &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogRecGetInfo&lt;/span&gt;(record) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;~&lt;/span&gt;XLR_INFO_MASK;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; XLOG_HEAP_OPMASK)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; XLOG_HEAP2_CLEAN:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;heap_xlog_clean&lt;/span&gt;(record);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Only when the redo is &lt;code&gt;XLOG_HEAP2_CLEAN&lt;/code&gt; does it enter the next function &lt;code&gt;heap_xlog_clean&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;PG 18 no longer has &lt;code&gt;XLOG_HEAP2_CLEAN&lt;/code&gt; (it was actually removed around PG15 — this article only looks at versions 13 and 18), but the define can still be found in heapam_xlog.h:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//pg13
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_CLEAN		0x10
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_FREEZE_PAGE	0x20
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_CLEANUP_INFO 0x30&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//pg18
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; There&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s no difference between XLOG_HEAP2_PRUNE_ON_ACCESS,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_PRUNE_VACUUM_CLEANUP records.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; They have separate opcodes just &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; debugging and analysis purposes, to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; indicate why the WAL record was emitted.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I pulled out PG18&amp;rsquo;s source because PG13 (our production version) has zero explanation for these CLEAN xl_info macros, making them hard to understand. Since PG18 renamed the macros to something more intuitive and added comments, we can use PG18&amp;rsquo;s source to understand PG13&amp;rsquo;s — to figure out what this WAL record does.&lt;/p&gt;
&lt;p&gt;All three opcodes are fundamentally PRUNE-related WAL records. From the names, PRUNE_ON_ACCESS looks like pruning triggered by access, while the other two are tied to VACUUM operations.&lt;/p&gt;
&lt;p&gt;When checking with &lt;code&gt;pg_waldump&lt;/code&gt;, &lt;code&gt;rmgr: Heap2 CLEAN remxid&lt;/code&gt; records appear every few seconds, with highly varied filenodes and no relation to the static table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_waldump &lt;span style="color:#ae81ff"&gt;00000001000012F&lt;/span&gt;E00000001 &lt;span style="color:#f92672"&gt;|&lt;/span&gt;tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;egrep &lt;span style="color:#f92672"&gt;-&lt;/span&gt;i heap2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error in WAL record at &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F34F138: invalid resource manager ID &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; at &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F34F168
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 &lt;span style="color:#a6e22e"&gt;len&lt;/span&gt; (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot)&lt;span style="color:#f92672"&gt;:&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3520&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;346&lt;/span&gt;ED0, prev &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;346&lt;/span&gt;EA0, desc: CLEAN remxid &lt;span style="color:#ae81ff"&gt;1983744188&lt;/span&gt;, blkref &lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; rel &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;88121&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1083807&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;617606&lt;/span&gt; FPW
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 &lt;span style="color:#a6e22e"&gt;len&lt;/span&gt; (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot)&lt;span style="color:#f92672"&gt;:&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;BC60, prev &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;BC30, desc: CLEAN remxid &lt;span style="color:#ae81ff"&gt;1984090598&lt;/span&gt;, blkref &lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; rel &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;88121&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;504681&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;1447147&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches our symptom pattern: no vacuum activity, but PRUNE is happening, leading into &lt;code&gt;heap_xlog_clean&lt;/code&gt; → &lt;code&gt;ResolveRecoveryConflictWithSnapshot&lt;/code&gt; and the rest of the conflict machinery.&lt;/p&gt;
&lt;p&gt;The PRUNE action producing &lt;code&gt;rmgr: Heap2 CLEAN remxid&lt;/code&gt; WAL records will be demonstrated later via testing.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s finish the source code analysis first.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ResolveRecoveryConflictWithSnapshot
 &lt;div id="resolverecoveryconflictwithsnapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#resolverecoveryconflictwithsnapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ResolveRecoveryConflictWithSnapshot&lt;/span&gt;(TransactionId latestRemovedXid, RelFileNode node)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	VirtualTransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;backends;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * This can happen when replaying already-applied WAL records after a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * record that marks as frozen a page which was already all-visible. It&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * also quite common with records generated during index deletion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * (original execution of the deletion can reason that a recovery conflict
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * which is sufficient for the deletion operation must take place before
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * replay of the deletion record itself).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(latestRemovedXid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	backends &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetConflictingVirtualXIDs&lt;/span&gt;(latestRemovedXid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 node.dbNode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ResolveRecoveryConflictWithVirtualXIDs&lt;/span&gt;(backends,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are several types of query conflicts. &lt;code&gt;ResolveRecoveryConflictWithSnapshot&lt;/code&gt; lives up to its name — it&amp;rsquo;s a snapshot conflict.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; finds which backends conflict with the snapshot. &lt;code&gt;ResolveRecoveryConflictWithVirtualXIDs&lt;/code&gt; handles the actual conflict resolution and timeout.&lt;/p&gt;

&lt;h3 class="relative group"&gt;GetConflictingVirtualXIDs
 &lt;div id="getconflictingvirtualxids" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#getconflictingvirtualxids" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; is the key function that determines whether a backend&amp;rsquo;s virtual transaction ID triggers a query conflict. It requires a bit of brainpower.&lt;/p&gt;
&lt;p&gt;Prerequisite knowledge for understanding this function:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;limitXmin&lt;/code&gt; is &lt;code&gt;latestRemovedXid&lt;/code&gt; — the &lt;code&gt;CLEAN remxid&lt;/code&gt; from WAL, the xid that needs to be cleaned up (I read remxid as &amp;ldquo;remove xid&amp;rdquo;). &lt;code&gt;/*limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId*/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PGPROC&lt;/code&gt; contains current process info: backend id, database id, lock info, and much more&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PGXACT&lt;/code&gt; contains the transaction info for the snapshot held by the current process. It&amp;rsquo;s lighter — the key field is xmin, the lowest xid the current process considers still running&lt;/li&gt;
&lt;li&gt;C&amp;rsquo;s &lt;code&gt;||&lt;/code&gt; rule: if either operand is true (non-zero), the result is true (1)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TransactionIdIsValid&lt;/code&gt; means &lt;code&gt;xid != 0&lt;/code&gt; — 0 is meaningless&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key function &lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; explained:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VirtualTransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetConflictingVirtualXIDs&lt;/span&gt;(TransactionId limitXmin, Oid dbOid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (index &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; index &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;numProcs; index&lt;span style="color:#f92672"&gt;++&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// iterate all local processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pgprocnos[index];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allProcs[pgprocno]; &lt;span style="color:#75715e"&gt;// process&amp;#39;s PGPROC
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgxact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allPgXact[pgprocno]; &lt;span style="color:#75715e"&gt;// process&amp;#39;s PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Exclude prepared transactions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// prepared transactions have no owning process — can&amp;#39;t handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;OidIsValid&lt;/span&gt;(dbOid) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#75715e"&gt;// global tables have dbOid=0 which is invalid — satisfies condition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;databaseId &lt;span style="color:#f92672"&gt;==&lt;/span&gt; dbOid) &lt;span style="color:#75715e"&gt;// only process current database. Cross-db is different — no transaction conflict at all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Fetch xmin just once - can&amp;#39;t change on us, but good coding */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId pxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin); &lt;span style="color:#75715e"&gt;// pgxact-&amp;gt;xmin is the minimum xid of transactions held by this process. UINT32_ACCESS_ONCE is just for atomic access protection — the xmin logic is unchanged
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We ignore an invalid pxmin because this means that backend has
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * no snapshot currently. We hold a Share lock to avoid contention
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * with users taking snapshots. That is not a problem because the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * current xmin is always at least one higher than the latest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * removed xid, so any new snapshot would never conflict with the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * test here.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(limitXmin) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#75715e"&gt;// limitXmin=0 possible? At least latestRemovedXid can&amp;#39;t be — I can&amp;#39;t think of a scenario where WAL would log an invalid xid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(pxmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdFollows&lt;/span&gt;(pxmin, limitXmin))) &lt;span style="color:#75715e"&gt;// TransactionIdIsValid(pxmin) is also not really needed. !TransactionIdFollows(pxmin, limitXmin) means pxmin &amp;lt;= limitXmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				VirtualTransactionId vxid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;GET_VXID_FROM_PGPROC&lt;/span&gt;(vxid, &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;VirtualTransactionIdIsValid&lt;/span&gt;(vxid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					vxids[count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vxid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The critical line is &lt;code&gt;!TransactionIdFollows(pxmin, limitXmin)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So the core logic for determining query conflicts is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The primary&amp;rsquo;s cleaned remxid &amp;gt;= the standby query&amp;rsquo;s snapshot-held minimum xid&lt;/strong&gt; → conflict.&lt;/li&gt;
&lt;li&gt;Only kills queries in the current database; global system tables (no database) are killed indiscriminately.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This means: &lt;strong&gt;even if the pruned table on the primary has nothing to do with the table being queried on the standby, a conflict CAN occur!!!&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;In-Page Pruning
 &lt;div id="in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Now that the conflict logic is clear, we still need to understand where the WAL CLEAN records come from. That requires looking at how PRUNE is triggered.&lt;/p&gt;
&lt;p&gt;From &lt;code&gt;README.HOT&lt;/code&gt; on when pruning and defragmentation occur — &amp;ldquo;When can/should we prune or defragment?&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The currently planned heuristic is to prune and defrag when first accessing a page that potentially has prunable tuples&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Prune and defragment are indeed two distinct concepts, but they often happen together.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prune: updating line pointers to shorten HOT chains, but doesn&amp;rsquo;t free space&lt;/li&gt;
&lt;li&gt;Defragment: reclaiming space from dead line pointers and tuples after pruning&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;We cannot prune or defragment unless we can get a &amp;ldquo;buffer cleanup lock&amp;rdquo; on the target page; otherwise, pruning might destroy line pointers that other backends have live references to, and defragmenting might move tuples that other backends have live pointers to&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The page must be under a &amp;ldquo;buffer cleanup lock&amp;rdquo; for prune or defragment to occur.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The worst-case consequence of this is only that an UPDATE cannot be made HOT but has to link to a new tuple version placed on some other page, for lack of centralized space on the original page.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;A typical scenario: a HOT update spills to another page (easy to test).&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;space reclamation happens during tuple retrieval when the page is nearly full (&amp;lt;10% free) and a buffer cleanup lock can be acquired. This means that UPDATE, DELETE, and SELECT can trigger space reclamation, but often not during INSERT &amp;hellip; VALUES because it does not retrieve a row.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;SELECT/UPDATE/DELETE that scan rows can trigger space reclamation. INSERT typically won&amp;rsquo;t, since it doesn&amp;rsquo;t retrieve rows.&lt;/p&gt;
&lt;p&gt;Clearly, after prune or defragment, the corresponding xids should be reclaimed. From the README we can see that HOT updates can reproduce prune/defragment, generating CLEAN WAL records. See [Test: Pure UPDATE Produces In-Page Pruning](## Test: Pure UPDATE Produces In-Page Pruning).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The tests below only observe whether conflicts occur, whether CLEAN WAL records appear, or whether page line pointers are updated — without distinguishing prune vs. defragment. In many cases both are triggered together; distinguishing them is tedious and maybe best left for later. The focus here is whether CLEAN WAL records appear.&lt;/p&gt;
&lt;p&gt;Helper SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--sql for test
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap_page_items
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags,&lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;(t_data,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap header
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; page_header(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--bt_page_items
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idxlzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--create table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl(a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxlzl &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;md5(random()::text); &lt;span style="color:#75715e"&gt;-- non-hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--force index scan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_seqscan &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_indexonlyscan&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--open an RR transaction to hold a snapshot for observation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ISOLATION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LEVEL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPEATABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Test: Cross-Table Query Conflict
 &lt;div id="test-cross-table-query-conflict" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-cross-table-query-conflict" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;primary&lt;/th&gt;
 &lt;th&gt;standby&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;create table lzl(a bigint primary key);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;insert into lzl values(1);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;select 1;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;update lzl set a=2;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;no blocking&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vacuum lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;#3 ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x277c340, reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info=wait_event_info@entry=134217762, report_waiting=report_waiting@entry=true) at standby.c:276&lt;br/&gt;#4 0x0000000000787b33 in ResolveRecoveryConflictWithVirtualXIDs (report_waiting=true, wait_event_info=134217762, reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=&lt;optimized out&gt;) at standby.c:333&lt;br/&gt;#5 ResolveRecoveryConflictWithSnapshot (latestRemovedXid=&lt;optimized out&gt;, node=&amp;hellip;) at standby.c:329&lt;br/&gt;#6 0x00000000004c8ffe in heap_xlog_clean (record=0x273a258) at heapam.c:7764&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: As long as a query exists, it has a snapshot, and a snapshot has a snapshot xmin. Even if the queried table is completely unrelated, a query conflict CAN occur.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Vacuum Produces In-Page Pruning
 &lt;div id="test-vacuum-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-vacuum-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Pruning occurs, conflicts occur. Example omitted — not relevant to this case.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: UPDATE Produces In-Page Pruning
 &lt;div id="test-update-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-update-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--HOT, off-page update triggers defragment
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--An 8k heap page stores 4-2xx rows. Here we size rows so 4 fit and remain HOT — the next update spills off-page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl(a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; idxlzl &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap page: 4 rows, all HOT:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954161&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954162&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954162&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954163&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954163&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--index: only one entry:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--One more update triggers off-page update
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--page full, can&amp;#39;t HOT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--HOT chain changed. LP changed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+----------+----------+--------+--------------------------------------------------------------------------------------+----------------+---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_REDIRECT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954165&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954165&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--index: still only one entry, unchanged:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The next update doesn&amp;rsquo;t go to a new page — instead, in-page pruning happens first, freeing space on the same page, so the row is written locally. This saves a page access.&lt;/p&gt;
&lt;p&gt;WAL produces CLEAN remxid, confirming that a query conflict can occur:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 62/ 62, tx: 0, lsn: 3DB/F8017348, prev 3DB/F8017310, desc: CLEAN remxid 34954177, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/5893914/5893920 blk 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 2070/ 2070, tx: 34954178, lsn: 3DB/F8017388, prev 3DB/F8017348, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;34954178&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/5893914/5893920 blk 0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: UPDATE statements can produce in-page pruning and can cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Hint-Bit Writeback Producing In-Page Pruning?
 &lt;div id="test-hint-bit-writeback-producing-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-hint-bit-writeback-producing-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;primary&lt;/th&gt;
 &lt;th&gt;standby&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_log_hints=on&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;truncate table lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;insert into lzl values(&amp;lsquo;z&amp;rsquo;);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;select * from lzl;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;delete from lzl where a=&amp;lsquo;z&amp;rsquo;;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;checkpoint;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;select * from lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&amp;ndash;WAL contains FPI_FOR_HINT&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;no query conflict&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Standby pageinspect:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+------------------------------------------------------------------------------+----------------+-------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954229&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954230&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a202020202020202020202020202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: WAL log hints only sync hint bits and don&amp;rsquo;t affect xmin/xmax. No CLEAN or similar records are produced, so hint-bit writeback does NOT cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: SELECT Produces In-Page Pruning
 &lt;div id="test-select-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-select-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SELECT normally doesn&amp;rsquo;t cause pruning, but it does when the page is nearly full: &lt;a href="https://www.modb.pro/db/1683648157451362304" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/1683648157451362304&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Testing pruning on a full page:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Same table as before, 4 HOT rows, nearly full
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--page at this point:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954232&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954233&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954233&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954234&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954234&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- A SELECT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--page now shows in-page pruning:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+----------+--------+--------+---------------------------------------------------------------------------------------+----------------+---------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_REDIRECT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020202020202020&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: SELECT can produce in-page pruning and can cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Shared Table Cross-Database Query Conflict
 &lt;div id="test-shared-table-cross-database-query-conflict" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-shared-table-cross-database-query-conflict" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Shared tables are global. Earlier in &lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; we saw that global tables are killed indiscriminately. Let&amp;rsquo;s test.&lt;/p&gt;
&lt;p&gt;Shared table info:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Source&lt;/span&gt; definition: IsSharedRelation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Source&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;check&lt;/span&gt;: shared &lt;span style="color:#f92672"&gt;?&lt;/span&gt; InvalidOid : MyDatabaseId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt;: pg_class.relisshared
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Directory: &lt;span style="color:#66d9ef"&gt;global&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Querying &lt;code&gt;pg_class.relisshared&lt;/code&gt; directly is easier:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relkind,relisshared &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relisshared &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relkind&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relisshared
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------+---------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_authid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_subscription &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_database &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_db_role_setting &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_tablespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_auth_members &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shdepend &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shdescription &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_replication_origin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shseclabel &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_authid&lt;/code&gt; stores role/user info. Testing with a password change:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Test: on the primary, in a non-business database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; password &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--run several times&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CLEAN remxid appears:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0F8, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0B8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: HOT_UPDATE &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;67&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt; flags &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x20 ; &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, blkref &lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;: rel &lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1260&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D148, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0F8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;680782&lt;/span&gt; CST; inval msgs: catcache &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D1A0, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D148, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: CLEAN remxid &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, blkref &lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;: rel &lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1260&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954265&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D1E0,&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The standby business database&amp;rsquo;s &lt;code&gt;select 1&lt;/code&gt; query was killed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: Shared tables can cause cross-database query conflicts.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That said, these shared system tables rarely see heavy updates in normal operations.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Conclusions
 &lt;div id="conclusions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Developer Perspective
 &lt;div id="developer-perspective" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#developer-perspective" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Query conflicts can be completely unrelated to the table being queried — meaning a fully static table CAN experience conflicts.&lt;/p&gt;
&lt;p&gt;Cross-database means different business domains and data. Cross-database does NOT cause query conflicts. The one exception is shared tables, but these are just a handful of system tables that rarely see updates.&lt;/p&gt;
&lt;p&gt;For developers, focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Retry on failure&lt;/strong&gt;: Standby queries can be killed — retrying is essential, and retries may succeed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query duration&lt;/strong&gt;: Longer queries are more likely to be killed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alternative standbys&lt;/strong&gt;: Consider using a different standby with lower disaster-recovery requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Operations Perspective
 &lt;div id="operations-perspective" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#operations-perspective" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since query conflicts can come from &amp;ldquo;all directions,&amp;rdquo; a simple long-running single-table query can be killed by in-page pruning on a completely different, frequently-updated table. You can increase &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; to reduce conflict probability.&lt;/p&gt;
&lt;p&gt;However, &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; trades off against WAL apply — a longer delay means WAL application is paused. This parameter&amp;rsquo;s value directly represents the maximum possible standby replication lag (it can&amp;rsquo;t cap lag from network or other factors).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Query freshness&lt;/strong&gt;: Prolonged WAL apply pauses mean the standby data lags significantly (WAL may already be on the standby&amp;rsquo;s disk), affecting data freshness requirements for other standby queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RTO&lt;/strong&gt;: If the primary suffers a disaster and failover is needed, the standby must apply accumulated WAL. If apply delay stretches to hours, it may violate minute-level RTO SLAs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So tuning &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; is a delicate exercise requiring consideration of the standby&amp;rsquo;s role, query freshness requirements, and even geography.&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL DDL Pitfalls and Clever Solutions</title><link>https://lastdba.com/en/2025/07/19/postgresql-ddl-pitfalls-and-clever-solutions/</link><pubDate>Sat, 19 Jul 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/07/19/postgresql-ddl-pitfalls-and-clever-solutions/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="DDL Pitfalls and Solutions" /&gt;&lt;/p&gt;
&lt;p&gt;Save it, use it freely, no need to ask.&lt;/p&gt;
&lt;p&gt;May be updated, may not be.&lt;/p&gt;
&lt;p&gt;Feedback welcome — pick it apart if you can.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="DDL Pitfalls and Solutions" /&gt;&lt;/p&gt;
&lt;p&gt;Save it, use it freely, no need to ask.&lt;/p&gt;
&lt;p&gt;May be updated, may not be.&lt;/p&gt;
&lt;p&gt;Feedback welcome — pick it apart if you can.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>Linux Memory Advanced</title><link>https://lastdba.com/en/2025/06/19/linux-memory-advanced/</link><pubDate>Thu, 19 Jun 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/06/19/linux-memory-advanced/</guid><description>&lt;p&gt;(For memory basics, refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;Linux Memory Analysis&lt;/a&gt;; this article covers memory knowledge above that foundation)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Basic Concepts
 &lt;div id="memory-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process of buddy system allocating and merging pages is omitted.&lt;/p&gt;
&lt;p&gt;Easily overlooked knowledge points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The prerequisite for buddy merging two blocks of the same size is that their &lt;strong&gt;physical addresses are contiguous&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The merge algorithm is iterative: after merging at the current level, it will automatically attempt to merge larger blocks. This means compactd is not strictly required for merging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;page table &amp;amp; PTE
 &lt;div id="page-table--pte" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table--pte" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;page table and PTE are actually two different concepts, and they are easily confused because both generally refer to page tables. Below is relevant knowledge about page table and PTE[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;</description><content:encoded>&lt;p&gt;(For memory basics, refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;Linux Memory Analysis&lt;/a&gt;; this article covers memory knowledge above that foundation)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Basic Concepts
 &lt;div id="memory-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process of buddy system allocating and merging pages is omitted.&lt;/p&gt;
&lt;p&gt;Easily overlooked knowledge points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The prerequisite for buddy merging two blocks of the same size is that their &lt;strong&gt;physical addresses are contiguous&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The merge algorithm is iterative: after merging at the current level, it will automatically attempt to merge larger blocks. This means compactd is not strictly required for merging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;page table &amp;amp; PTE
 &lt;div id="page-table--pte" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table--pte" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;page table and PTE are actually two different concepts, and they are easily confused because both generally refer to page tables. Below is relevant knowledge about page table and PTE[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PTE stores the physical address of the page frame&lt;/li&gt;
&lt;li&gt;&amp;ldquo;page table&amp;rdquo; and &amp;ldquo;Page Table&amp;rdquo; are different concepts: &amp;ldquo;page table&amp;rdquo; refers to the pages that maintain the mapping between linear addresses and physical addresses, while &amp;ldquo;Page Table&amp;rdquo; refers to pages in the upper-level page table&lt;/li&gt;
&lt;li&gt;pte_t, pmd_t, pud_t, pgd_t describe Page Table Entry, Page Middle Directory entry, Page Upper Directory entry, and Page Global Directory entry respectively&lt;/li&gt;
&lt;li&gt;PTE is Page Table Entry&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only look at the size of the pagetable used by the MMU to cache virtual-to-physical memory mappings, confusing pagetable with PTE doesn&amp;rsquo;t make much difference. However, if you need to go deep into page table directories, you need to separate the two concepts.&lt;/p&gt;

&lt;h3 class="relative group"&gt;TLB
 &lt;div id="tlb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tlb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each level of the page table is stored in memory. To complete a single virtual-to-physical address translation, all four page tables corresponding to the current virtual address must be found. &lt;strong&gt;This means a single memory IO requires looking up the page table in memory 4 times just for virtual-to-physical address translation&lt;/strong&gt;. Translation Lookaside Buffers (TLB) are caches specifically designed to accelerate virtual-to-physical address translation.&lt;/p&gt;
&lt;p&gt;Regarding the TLB&amp;rsquo;s location, it is usually in the L1 cache (some say it&amp;rsquo;s in registers or L2, which likely depends on the CPU architecture; for now, just consider it as CPU cache, distinct from main memory)&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0a897b5be8a9.png" alt="image.png" /&gt;
In modern processors, the L1 cache is typically divided into multiple parts, including data cache dTLB and instruction cache iTLB. Frequently modifying page tables leads to increased main memory accesses, causing the CPU to frequently flush the TLB cache[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. The TLB also has a finite size; improving TLB hit rate can reduce accesses to the main memory pagetable. Using huge pages can reduce PTEs by three orders of magnitude, greatly reducing TLB misses.[^ 《深入理解Linux进程和内存》 (Understanding Linux Processes and Memory)].&lt;/p&gt;
&lt;p&gt;TLB information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cpuid -l&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L1 TLB/cache information: 2M/4M pages &amp;amp; L1 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000005/eax&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L1 TLB/cache information: 4K pages &amp;amp; L1 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000005/ebx&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L2 TLB/cache information: 2M/4M pages &amp;amp; L2 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000006/eax&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Observing TLB hit rate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -I &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; -p $PM_PID &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;During memory reclamation, TLB misses do increase, but it&amp;rsquo;s hard to establish a causal relationship. TLB miss is just one observation metric for the MMU — TLB is part of MMU.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reverse Mapping
 &lt;div id="reverse-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reverse-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The general principles of PFRA (Page Frame Reclaiming Algorithm)[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, release &amp;ldquo;harmless&amp;rdquo; pages. Start by reclaiming harmless pages in the pagecache — pages not occupied by any process&lt;/li&gt;
&lt;li&gt;All pages of user-mode processes are candidates for reclamation. FRPA will gradually deprive user-mode pages with longer sleep times of their page frames&lt;/li&gt;
&lt;li&gt;Cancel the mapping of all page table entries for a shared page frame, then reclaim that shared page frame&lt;/li&gt;
&lt;li&gt;Only reclaim &amp;ldquo;unused&amp;rdquo; pages&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of PFRA&amp;rsquo;s goals is to be able to release shared page frames. The process of quickly locating all page table entries pointing to the same page frame is called reverse mapping.&lt;/p&gt;
&lt;p&gt;Reverse mappings for shared&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anonymous pages&lt;/li&gt;
&lt;li&gt;File-mapping pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basic tricks of page frame reclaiming&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LRU lists&lt;/li&gt;
&lt;li&gt;Free cheapest pages first&lt;/li&gt;
&lt;li&gt;Unmap all at once&lt;/li&gt;
&lt;li&gt;Etc&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Huge Pages
 &lt;div id="huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Enabling huge pages provides certain performance improvements for specific application workloads. In PostgreSQL, enabling huge pages on large-memory instances also offers some performance gains and even some stability benefits.&lt;/p&gt;
&lt;p&gt;Why are huge pages better?&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduced TLB pressure&lt;/li&gt;
&lt;li&gt;Reduced pagetable size in main memory&lt;/li&gt;
&lt;li&gt;Huge pages are physically contiguous. Contiguous physical memory access is better than non-contiguous physical memory access&lt;/li&gt;
&lt;li&gt;When using these kinds of larger pages, higher level pages can directly map them, with no need to use lower level page entries[^ kernel.org,mm pagetables]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, using huge pages brings management challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge pages need to be pre-allocated&lt;/li&gt;
&lt;li&gt;Huge page size must be calculated in advance to avoid memory waste&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two ways for processes to use huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first is by using &lt;code&gt;shmget()&lt;/code&gt; to setup a shared region backed by huge pages&lt;/li&gt;
&lt;li&gt;the second is the call &lt;code&gt;mmap()&lt;/code&gt; on a file opened in the huge page filesystem&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;C Library and System Calls
 &lt;div id="c-library-and-system-calls" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#c-library-and-system-calls" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The middle layer between kernel space and user space is the system call layer. Application Programming Interfaces (APIs) and system calls are different. Applications call APIs implemented in user space to program, rather than directly executing system calls. In the UNIX world, the most common system call layer is the POSIX standard (Portable Operation System Interface of UNIX). The POSIX standard targets APIs, not system calls. The Linux operating system&amp;rsquo;s API is typically provided in the form of C standard libraries, such as libc. The C standard library provides implementations for most POSIX APIs.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;p&gt;C app-&amp;gt;C lib-&amp;gt;system calls-&amp;gt;OS-&amp;gt;hardware&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/72d91b350d7d.png" alt="image.png" /&gt;
Common C library and system calls:&lt;/p&gt;
&lt;p&gt;malloc,free=&amp;gt;C lib&lt;/p&gt;
&lt;p&gt;mmap、brk、munmap=&amp;gt;system calls&lt;/p&gt;

&lt;h3 class="relative group"&gt;Page Fault Exception
 &lt;div id="page-fault-exception" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-fault-exception" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Page fault exceptions (or page fault interrupts) need to distinguish two cases: exceptions caused by programming errors; and physical page allocation behavior triggered by using virtual address space where physical page frames haven&amp;rsquo;t been allocated yet.[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Exceptional page fault: Segment Fault — each virtual memory area has associated permissions. If a process accesses a memory area outside its valid range, or illegally accesses a memory area, or accesses a memory area in an incorrect manner, the processor reports a page fault exception. In severe cases, it reports a &amp;ldquo;Segment Fault&amp;rdquo; and terminates the process[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)].&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Normal page fault: System calls like mmap and brk manage virtual memory; they don&amp;rsquo;t directly allocate physical memory. Virtual memory system call functions only establish the process address space. Virtual memory is visible in user space, but no mapping between virtual memory and physical memory has been established. When a process accesses virtual memory where no mapping has been established, a page fault interrupt is triggered.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Page faults are also divided into two types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;minor fault: the page fault was handled without blocking the current process, and a page frame was allocated&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;major fault: the page fault forced the current process to sleep (likely because filling the page frame with data from disk took time). A page fault that blocks the current process is a major fault[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Copy-On-Write (COW)
 &lt;div id="copy-on-write-cow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#copy-on-write-cow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When the fork system call is executed, the child process and parent process have independent process address spaces but share physical memory resources, including process context, process stack, memory information, file descriptors, directories, resource limits, etc. Only the parent process&amp;rsquo;s page table needs to be copied to the child process. At this point, sharing is read-only. When writing is needed (when running their respective tasks), data is copied, giving the parent and child processes their own copies.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;p&gt;For PostgreSQL&amp;rsquo;s multi-process model, fork itself isn&amp;rsquo;t heavy — you may only need to worry about page tables — but the various tasks that come after fork will trigger copy-on-write to create the child process&amp;rsquo;s own resource copies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note the distinction between copy-on-write and page fault exceptions: copy-on-write refers to resources not being allocated to the child process at fork time; page fault exceptions refer to physical memory allocation occurring for this process, unrelated to fork.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;mmap, brk &amp;amp; Shared Memory Mapping Area, Heap Area
 &lt;div id="mmap-brk--shared-memory-mapping-area-heap-area" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mmap-brk--shared-memory-mapping-area-heap-area" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The functions and memory address regions used by mmap and brk are different:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;mmap&lt;/code&gt; is used to manage shared memory, corresponding to the shared memory mapping area&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;brk&lt;/code&gt; is used to manage private memory, corresponding to the heap area&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Linear address region functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;mmap: The mapping area expands top-down. The mmap mapping area and heap expand toward each other until they exhaust the remaining space in the virtual address space. This structure facilitates the C runtime library&amp;rsquo;s use of the mmap mapping area and heap for memory allocation.&lt;/li&gt;
&lt;li&gt;Stack: Stores local variables and function parameters during program execution, grows from high addresses to low addresses&lt;/li&gt;
&lt;li&gt;Heap: Dynamic memory allocation area, managed through functions like malloc, new, free, and delete&lt;/li&gt;
&lt;li&gt;BSS (Uninitialized Variables): Stores uninitialized global variables and static variables&lt;/li&gt;
&lt;li&gt;Data: Stores global variables and static variables with predefined values in source code&lt;/li&gt;
&lt;li&gt;Text (Code): Stores read-only program execution code, i.e., machine instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shared memory mapping area and heap area&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/974eb641977f.png" alt="image.png" /&gt;
Real postmaster heap and shared memory mapping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/1063005/smaps |grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s|heap&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;022a4000-022ee000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef6019e000-7fef601a5000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:17 &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; /dev/shm/PostgreSQL.1291978332
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef601a5000-7fef6098b000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:01 &lt;span style="color:#ae81ff"&gt;1052&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#75715e"&gt;#this is shared buffers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef6e238000-7fef6e239000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:01 &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; /SYSV0011f702 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can see the heap and shared memory area addresses roughly match.&lt;/p&gt;

&lt;h2 class="relative group"&gt;VM
 &lt;div id="vm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Linux kernel virtual memory subsystem&lt;/p&gt;
&lt;p&gt;Directory: &lt;code&gt;cd /proc/sys/vm/&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;compact
 &lt;div id="compact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#compact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Memory compaction is a mechanism in the Linux kernel for solving memory fragmentation problems. It improves the allocation and compaction efficiency of large contiguous memory pages by merging free physical pages.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Function&lt;/th&gt;
 &lt;th&gt;Default/Range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compact_memory&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Manually trigger a global memory compaction operation&lt;/td&gt;
 &lt;td&gt;Write 1 to trigger&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compaction_proactiveness&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls the frequency of proactive compaction&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0-100 (default 20)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compact_unevictable_allowed&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Whether to allow compaction of unreclaimable pages (e.g., &lt;code&gt;mlock&lt;/code&gt; locked memory)&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0 (disable) or 1 (allow)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;defrag_mode&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls the trigger strategy for memory defragmentation&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0-3. 0 disables automatic compaction; 1 defers passive compaction. Default in 3.10 is 1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;extfrag_threshold&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Threshold for triggering compaction when large memory blocks are insufficient&lt;/td&gt;
 &lt;td&gt;0-1000 (default 500)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There are 3 compaction modes (depending on kernel version support):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Passive compaction: &lt;code&gt;extfrag_threshold&lt;/code&gt; addresses &amp;ldquo;already occurred&amp;rdquo; fragmentation problems — triggered when a process requests large memory blocks and finds them insufficient.&lt;/li&gt;
&lt;li&gt;Proactive compaction: &lt;code&gt;compaction_proactiveness&lt;/code&gt; proactively controls compaction aggressiveness, optimizing &amp;ldquo;not yet occurred&amp;rdquo; but potential fragmentation risks.&lt;/li&gt;
&lt;li&gt;Manual compaction: &lt;code&gt;compact_memory&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;extfrag_threshold&lt;/code&gt; is the Linux kernel parameter controlling passive compaction. When the kernel fails to allocate high-order contiguous physical memory (e.g., huge pages), it determines the failure cause via the fragmentation index:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt;: Allocation succeeded (watermark satisfied)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt;: Failed due to insufficient memory&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1000&lt;/code&gt;: Failed due to fragmentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;View specific values via &lt;code&gt;/sys/kernel/debug/extfrag/extfrag_index&lt;/code&gt;. The output is a floating-point number (e.g., &lt;code&gt;0.854&lt;/code&gt;), but the actual range is magnified 1000x, so &lt;code&gt;0.854&lt;/code&gt; corresponds to an actual value of 854:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/kernel/debug/extfrag/extfrag_index |grep Normal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.995 0.998 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If extfrag_threshold=600, compaction is triggered when the fragmentation index &amp;gt; 600. extfrag_index is quite useful and can assist buddy in observing fragmentation issues.&lt;/p&gt;

&lt;h3 class="relative group"&gt;dirty
 &lt;div id="dirty" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#dirty" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param-1" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param-1" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Dirty page flushing is somewhat similar to memory reclamation and is also divided into asynchronous and synchronous:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Asynchronous flushing: performed by background threads like pdflush/flush/kdmflush; application writes are not affected&lt;/li&gt;
&lt;li&gt;Synchronous flushing: directly blocks the application process; the process that initiated the write operation flushes the dirty pages itself&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_background_bytes&lt;/td&gt;
 &lt;td&gt;Background async flush threshold, in bytes&lt;/td&gt;
 &lt;td&gt;0 (disabled)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_background_ratio&lt;/td&gt;
 &lt;td&gt;Background async flush threshold, as percentage&lt;/td&gt;
 &lt;td&gt;10%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_bytes&lt;/td&gt;
 &lt;td&gt;Synchronous flush threshold, in bytes&lt;/td&gt;
 &lt;td&gt;0 (disabled)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_ratio&lt;/td&gt;
 &lt;td&gt;Synchronous flush threshold, as percentage&lt;/td&gt;
 &lt;td&gt;20-40%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_expire_centisecs&lt;/td&gt;
 &lt;td&gt;Maximum lifetime of dirty pages in memory&lt;/td&gt;
 &lt;td&gt;3000 (30s)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_writeback_centisecs&lt;/td&gt;
 &lt;td&gt;Frequency of kernel periodic dirty page state checks&lt;/td&gt;
 &lt;td&gt;500 (5s)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;xxx_bytes and xxx_ratio parameters are mutually exclusive.&lt;/p&gt;
&lt;p&gt;Example parameters and flowchart:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_background_bytes &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_background_ratio &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_bytes &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_ratio &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_expire_centisecs &lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_writeback_centisecs &lt;span style="color:#ae81ff"&gt;500&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;%% Dirty page flushing flow diagram integrating time parameters
graph TD
 A[App writes generate dirty pages] --&amp;gt; B{Check interval reached?&amp;lt;br&amp;gt;dirty_writeback_centisecs every 5s}
 B -- No --&amp;gt; D{Expired dirty pages exist?&amp;lt;br&amp;gt; dirty_expire_centisecs&amp;gt;30s}
 B -- Yes --&amp;gt; C{Dirty page threshold check}
 C --&amp;gt; E[Dirty page ratio? dirty_background_ratio&amp;gt;10% ]
 C --&amp;gt; F[Dirty page ratio? dirty_ratio&amp;gt; 40%]
 E -- Trigger --&amp;gt; G[Background async flush]
 F -- Trigger --&amp;gt; H[Synchronous flush]
 D -- Trigger --&amp;gt; G
 G --&amp;gt; I[Dirty pages written to disk]
 H --&amp;gt; I[Dirty pages written to disk] 
 I --&amp;gt; J[Free memory space]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The configuration principles for dirty page flush parameters are basically the same as PostgreSQL dirty page flush parameters. Setting them too low causes overly frequent flushing — the same dirty page may be written to disk multiple times, wasting IO. Setting them too high may cause IO storms.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Observing Dirty Pages
 &lt;div id="observing-dirty-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-dirty-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Monitoring dirty pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps -eo pid,%cpu,%mem,wchan,args,state|grep kdmflush|grep -E -w -v &lt;span style="color:#e6db74"&gt;&amp;#34;S&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Observe async flush process state&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/vmstat| grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_dirty|nr_writeback&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#vmstat dirty, page count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -i dirty &lt;span style="color:#75715e"&gt;#meminfo dirty, KB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Testing dirty pages with dd:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;nr_dirty_threshold|nr_dirty_background_threshold&amp;#34;&lt;/span&gt; /proc/vmstat | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{printf &amp;#34;%s: %.2fGB\n&amp;#34;, $1, ($2*4)/1048576}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nr_dirty_threshold: 141.28GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nr_dirty_background_threshold: 35.32GB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dd &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/dev/zero of&lt;span style="color:#f92672"&gt;=&lt;/span&gt;testfile bs&lt;span style="color:#f92672"&gt;=&lt;/span&gt;8k count&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;128000&lt;/span&gt; &lt;span style="color:#75715e"&gt;# cache io &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Failed test (same result after multiple tests):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No RUNNING kdmflush process observed&lt;/li&gt;
&lt;li&gt;Dirty pages were flushed before reaching 35GB or 30S threshold&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Timestamp&lt;/th&gt;
 &lt;th&gt;nr_dirty&lt;/th&gt;
 &lt;th&gt;nr_dirty(GB)&lt;/th&gt;
 &lt;th&gt;Trend Simulation&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:18&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;2,757&lt;/td&gt;
 &lt;td&gt;0.01052&lt;/td&gt;
 &lt;td&gt;▍&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:19&lt;/td&gt;
 &lt;td&gt;336,199&lt;/td&gt;
 &lt;td&gt;1.282&lt;/td&gt;
 &lt;td&gt;████▌&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:25&lt;/td&gt;
 &lt;td&gt;1,984,867&lt;/td&gt;
 &lt;td&gt;7.574&lt;/td&gt;
 &lt;td&gt;██████████████▍&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:32&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;4,252,177&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;16.22&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;████████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:33&lt;/td&gt;
 &lt;td&gt;3,699,227&lt;/td&gt;
 &lt;td&gt;14.11&lt;/td&gt;
 &lt;td&gt;█████████████████▊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:38&lt;/td&gt;
 &lt;td&gt;170,865&lt;/td&gt;
 &lt;td&gt;0.652&lt;/td&gt;
 &lt;td&gt;▎&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:46&lt;/td&gt;
 &lt;td&gt;2,865,814&lt;/td&gt;
 &lt;td&gt;10.93&lt;/td&gt;
 &lt;td&gt;█████████▋&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:54&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;4,721,827&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;18.01&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;██████████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:55&lt;/td&gt;
 &lt;td&gt;3,876,509&lt;/td&gt;
 &lt;td&gt;14.79&lt;/td&gt;
 &lt;td&gt;██████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:01:03&lt;/td&gt;
 &lt;td&gt;835,097&lt;/td&gt;
 &lt;td&gt;3.186&lt;/td&gt;
 &lt;td&gt;██▊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;os dirty != pg dirty
 &lt;div id="os-dirty--pg-dirty" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#os-dirty--pg-dirty" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;With pg fsync=on, data writes go through the OS pagecache before specific blocks are written to disk. PostgreSQL has its own dirty pages, and the OS also has dirty pages. What&amp;rsquo;s the relationship between the two?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Observation commands&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;Dirty&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;# OS dirty pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; isdirty,pinning_backends,count&lt;span style="color:#f92672"&gt;(&lt;/span&gt;*&lt;span style="color:#f92672"&gt;)&lt;/span&gt; from pg_buffercache where isdirty is true group by isdirty,pinning_backends; &lt;span style="color:#75715e"&gt;# PG dirty pages&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test results:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;stage&lt;/th&gt;
 &lt;th&gt;dirty in pg&lt;/th&gt;
 &lt;th&gt;OS dirty&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Clean state&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After insert completion&lt;/td&gt;
 &lt;td&gt;200M&lt;/td&gt;
 &lt;td&gt;Rose to 1.7G, then dropped to 20KB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After commit&lt;/td&gt;
 &lt;td&gt;200M&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After checkpoint flush&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When the insert data size is increased, OS dirty rises during insert, rising to the GB level and then fluctuating.&lt;/p&gt;
&lt;p&gt;PG dirty has some relation to OS dirty but they&amp;rsquo;re not entirely correlated. When PG inserts data, OS dirty does rise, but after the OS flushes its own dirty pages, PG&amp;rsquo;s dirty pages remain dirty. Preliminary judgment: dirty pages in shared memory are unrelated to OS dirty. It&amp;rsquo;s yet to be determined whether the OS dirty increase comes from PG&amp;rsquo;s private memory dirty pages.&lt;/p&gt;

&lt;h3 class="relative group"&gt;swappiness
 &lt;div id="swappiness" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#swappiness" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Controls the kernel&amp;rsquo;s bias toward reclaiming memory from the anonymous memory pool or the page cache. Essentially, it controls whether swapping anonymous pages or reclaiming file pages imposes a lower cost for the upper-layer application. For example, for compute-oriented applications using more dynamic allocation or private memory, a lower swappiness should be set; for data-dependent applications, a higher swappiness should be set to reduce the impact of flushing file pages on data access. However, all of this depends on the efficiency of swap IO and filesystem IO&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt;. It all sounds ideal, but when swapping occurs, it very likely means performance degradation.&lt;/p&gt;

&lt;h4 class="relative group"&gt;swappiness=0
 &lt;div id="swappiness0" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#swappiness0" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When &lt;code&gt;swappiness=0&lt;/code&gt;, the kernel will only swap when memory reaches the high watermark&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt;. The specific strategy also relates to the kernel version and NUMA. What can be confirmed is that &lt;code&gt;swappiness=0&lt;/code&gt; does not mean swap is disabled — &lt;code&gt;swapoff -a&lt;/code&gt; is what disables the swap functionality.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Check if swap is enabled&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;swapon --show
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free -h |grep Swap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;swap|none&amp;#39;&lt;/span&gt; /etc/fstab
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo|grep Swap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Monitor whether swapping is occurring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/vmstat|grep swp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -W &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;inconsistent swap behavior
 &lt;div id="inconsistent-swap-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inconsistent-swap-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The OS-level /proc/sys/vm/swappiness has little-to-no effect on the swap behavior of cgroups v1 systems (has little-to-no effect on the swap). This issue can lead to inconsistent swap behavior&lt;sup id="fnref:8"&gt;&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref"&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Occurrence conditions (all must be true):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;vm.swappiness != cgroups memory.swappiness&lt;/li&gt;
&lt;li&gt;cgroups v1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cause:&lt;/p&gt;
&lt;p&gt;systemd creates cgroups early during startup, before &lt;code&gt;sysctl.service&lt;/code&gt; loads &lt;code&gt;/etc/sysctl.conf&lt;/code&gt;. vm.swappiness cannot constrain cgroup memory.swappiness. The issue is: when the OS swap behavior and cgroup behavior differ, which one takes effect?&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;for cgroup v1, set vm.swappiness = all cgroups memory.swappiness&lt;/li&gt;
&lt;li&gt;for cgroup v1, many solutions available, see &lt;a href="https://access.redhat.com/solutions/6785021" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/solutions/6785021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use cgroup v2. v2 adds the vm.force_cgroup_v2_swappiness parameter, which disables cgroup&amp;rsquo;s memory.swappiness&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;memory overcommitment
 &lt;div id="memory-overcommitment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-overcommitment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param-2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param-2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Linux does not reserve physical memory for every virtual address; instead, it allocates memory only when actually needed. Overcommitment can limit the total virtual memory size that all processes can request. When the requested memory exceeds the defined physical memory size, it&amp;rsquo;s called overcommit.&lt;/p&gt;
&lt;p&gt;There are three overcommit policy parameters: &lt;code&gt;overcommit_memory&lt;/code&gt;, &lt;code&gt;overcommit_ratio&lt;/code&gt;/&lt;code&gt;overcommit_kbytes&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;overcommit_memory&lt;/code&gt; parameter controls the overcommitment policy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt; (default): Heuristic overcommitment policy, allows slight overcommit. CommitLimit = physical memory + swap.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1&lt;/code&gt;: No overcommit check&lt;/li&gt;
&lt;li&gt;&lt;code&gt;2&lt;/code&gt;: Strict limit, prohibits exceeding &lt;code&gt;CommitLimit&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;graph TD
 A[Memory allocation request] --&amp;gt; B{Overcommit mode}
 B --&amp;gt;|Mode 0: Heuristic| C[&amp;#34;Allow moderate virtual memory overcommit&amp;#34;]
 B --&amp;gt;|Mode 1: Unlimited| D[&amp;#34;Virtual memory commits unconstrained&amp;#34;]
 B --&amp;gt;|Mode 2: Strict| E[&amp;#34;Virtual memory total ≤ CommitLimit&amp;#34;]
 C --&amp;gt; F[Allocate physical pages on demand at runtime]
 D --&amp;gt; G[May exhaust physical memory + Swap]
 E --&amp;gt; H[Enforce virtual memory total control]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;overcommit_memory=2&lt;/code&gt;, only one of the &lt;code&gt;overcommit_ratio&lt;/code&gt; and &lt;code&gt;overcommit_kbytes&lt;/code&gt; parameters takes effect. The &lt;code&gt;CommitLimit&lt;/code&gt; is calculated as follows:
$$
CommitLimit = (RAM - huge page memory) × \frac{overcommit_ratio}{100} + SwapTotal
$$
or
$$
CommitLimit = (RAM - huge page memory) + overcommit_kbytes + SwapTotal
$$
Interesting overcommit accounting&lt;sup id="fnref:9"&gt;&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref"&gt;9&lt;/a&gt;&lt;/sup&gt; — mmap, brk, fork are all accounted for, which clearly affects PostgreSQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Status
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mmap memory mappings
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mprotect changes in commit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mremap changes in size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account brk
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account munmap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We report the commit status in /proc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Account and check on fork
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Review stack handling/building on exec
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	SHMfs accounting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Implement actual limit enforcement&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Reserve Memory and Overcommit
 &lt;div id="reserve-memory-and-overcommit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reserve-memory-and-overcommit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;user_reserve_kbytes&lt;/code&gt;: When overcommit_memory=2, physical memory reserved for ordinary user processes. When system memory is severely insufficient, it ensures ordinary users can still perform basic operations (like starting new processes, handling memory allocation requests). Default is min(3% of the current process size, 128M). When set to 0, a single process can allocate (all free memory - admin_reserve_kbytes)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;admin_reserve_kbytes&lt;/code&gt;: Physical memory reserved for users with &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; privileges (typically root user), ensuring admin recovery capability — reserved physical memory ensuring the system administrator can log in and execute commands. Default is min(3% memory, 8MB). When using strict overcommit mode, it&amp;rsquo;s best to increase this parameter.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat user_reserve_kbytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;131072&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat admin_reserve_kbytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Observing Overcommit
 &lt;div id="observing-overcommit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-overcommit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;CommitLimit|Committed_AS&amp;#39;&lt;/span&gt; /proc/meminfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -r &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;CommitLimit|Committed_AS&amp;#39;&lt;/span&gt; /proc/meminfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommitLimit: &lt;span style="color:#ae81ff"&gt;203103492&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Committed_AS: &lt;span style="color:#ae81ff"&gt;252170700&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -r &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:35 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:37 PM &lt;span style="color:#ae81ff"&gt;25472180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;370249056&lt;/span&gt; 93.56 &lt;span style="color:#ae81ff"&gt;14588&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;274485956&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;252242936&lt;/span&gt; 62.91 &lt;span style="color:#ae81ff"&gt;233866528&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103568816&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:38 PM &lt;span style="color:#ae81ff"&gt;25471904&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;370249332&lt;/span&gt; 93.56 &lt;span style="color:#ae81ff"&gt;14588&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;274487888&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;252242740&lt;/span&gt; 62.91 &lt;span style="color:#ae81ff"&gt;233851748&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103570136&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11180&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Metric meanings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;meminfo CommitLimit: CommitLimit calculated from physical memory, Swap, and overcommit parameters&lt;/li&gt;
&lt;li&gt;meminfo Committed_AS: Total virtual memory currently requested by all processes&lt;/li&gt;
&lt;li&gt;sar -r kbcommit = Committed_AS&lt;/li&gt;
&lt;li&gt;sar -r %commit = kbcommit / total physical memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;smaps or status can also show total requested virtual memory, but directly summing smaps/status total virtual memory double-counts shared library files and mapped files (like mmap), while &lt;code&gt;Committed_AS&lt;/code&gt; only counts memory requested via mmap, brk, fork, etc., and does not double-count shared memory. The two have different calculation scopes. For total virtual memory, just look at Committed_AS or kbcommit.&lt;/p&gt;

&lt;h3 class="relative group"&gt;watermark
 &lt;div id="watermark" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Introduced&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Unit/Range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;min_free_kbytes&lt;/td&gt;
 &lt;td&gt;Defines the minimum free memory the system reserves, directly affecting the watermarks &lt;code&gt;watermark[min]&lt;/code&gt; calculation, ensuring the system retains enough memory for critical operations when memory is tight&lt;/td&gt;
 &lt;td&gt;Early kernel versions&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;KB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;watermark_scale_factor&lt;/td&gt;
 &lt;td&gt;Globally adjusts the memory watermark gap (&lt;code&gt;high-low&lt;/code&gt; and &lt;code&gt;low-min&lt;/code&gt;)&lt;/td&gt;
 &lt;td&gt;Linux kernel 4.x (exact minor version unknown)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;10&lt;/code&gt; (0.1% physical memory)&lt;/td&gt;
 &lt;td&gt;Max &lt;code&gt;3000&lt;/code&gt; (30% physical memory)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;watermark_boost_factor&lt;/td&gt;
 &lt;td&gt;Temporarily raises the high watermark (&lt;code&gt;high&lt;/code&gt;), triggering aggressive memory reclamation to reduce fragmentation&lt;/td&gt;
 &lt;td&gt;Linux kernel 4.x (exact minor version unknown)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;15000&lt;/code&gt; (i.e., 1.5x original high watermark)&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;min_free_kbytes
 &lt;div id="min_free_kbytes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#min_free_kbytes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Calculate total min and other values from zoneinfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo | grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;min|low|high&amp;#34;&lt;/span&gt;|grep -E -v &lt;span style="color:#e6db74"&gt;&amp;#34;high:&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/min/ { total_min += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/low/ { total_low += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/high/ { total_high += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;END {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; printf &amp;#34;Total min: %d KB\nTotal low: %d KB\nTotal high: %d KB\n&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; total_min * 4, total_low * 4, total_high * 4;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total min: &lt;span style="color:#ae81ff"&gt;15828844&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total low: &lt;span style="color:#ae81ff"&gt;19786048&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total high: &lt;span style="color:#ae81ff"&gt;23743260&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Current system min value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat min_free_kbytes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15828849&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because there are other zones, the total min across all zones is approximately equal to min_free_kbytes. The Normal zone&amp;rsquo;s min is definitely slightly smaller than min_free_kbytes; you only need to focus on the Normal zone:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Normal zone min, low, high settings; page=4k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo | grep -A &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; Normal | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;min|low|high&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min &lt;span style="color:#ae81ff"&gt;3931615&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; low &lt;span style="color:#ae81ff"&gt;4914518&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; high &lt;span style="color:#ae81ff"&gt;5897422&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before Linux kernel 4.6, min, low, and high had a fixed ratio, and you could only change low and high values by setting min_free_kbytes. &lt;strong&gt;min:low:high = 1:1.25:1.5&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Problems with the fixed ratio:&lt;/p&gt;
&lt;p&gt;Ideally, you&amp;rsquo;d want to raise low to more proactively trigger kswapd async reclamation and lower min to reduce direct reclaim. Before 4.6, you could only indirectly adjust low/high by adjusting min, using min to adjust kswapd&amp;rsquo;s delta working buffer. For example:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;kswapd async reclamation working buffer (low-min)&lt;/th&gt;
 &lt;th&gt;kswapd async reclamation workload (high-low)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;min=1GB, low=1.25GB, high=1.5GB&lt;/td&gt;
 &lt;td&gt;0.25GB&lt;/td&gt;
 &lt;td&gt;0.25GB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;min=10GB, low=12.5GB, high=15GB&lt;/td&gt;
 &lt;td&gt;2.5GB&lt;/td&gt;
 &lt;td&gt;2.5GB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Raising min is done to raise low and high.&lt;/p&gt;
&lt;p&gt;An excessively low min value causes kswapd to not have time to asynchronously reclaim more memory before direct reclaim triggers. An excessively high min not only wastes memory but also causes more frequent reclamation activity, resulting in higher sys CPU usage. The default difference between low and min in Linux indeed seems a bit small.&lt;/p&gt;

&lt;h4 class="relative group"&gt;watermark_scale_factor
 &lt;div id="watermark_scale_factor" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark_scale_factor" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Wouldn&amp;rsquo;t it be great if you could directly adjust min, low, and high? Sorry, the Linux kernel doesn&amp;rsquo;t support that (Android has extra_free_kbytes). But&amp;hellip;&lt;/p&gt;
&lt;p&gt;Since Linux kernel 4.x, the watermark_scale_factor parameter was added, allowing adjustment of the ratios between parameters — the ratio is no longer fixed. Its default value is 10, corresponding to 0.1% of memory (10/10000), with a maximum of 3000. When set to 1000, it means the difference between &amp;ldquo;low&amp;rdquo; and &amp;ldquo;min&amp;rdquo;, and between &amp;ldquo;high&amp;rdquo; and &amp;ldquo;low&amp;rdquo;, will both be 10% of memory size (1000/10000).&lt;/p&gt;
&lt;p&gt;0.1% is clearly too small — for 1TB of memory, the scale is only 1GB.&lt;/p&gt;

&lt;h4 class="relative group"&gt;watermark_boost_factor
 &lt;div id="watermark_boost_factor" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark_boost_factor" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;watermark_boost_factor is used to optimize external memory fragmentation. It temporarily raises the zone&amp;rsquo;s watermark, i.e., zone-&amp;gt;watermark_boost, thereby raising the zone&amp;rsquo;s high watermark (WMARK_HIGH). This allows kswapd to reclaim more memory, making it easier for the memory compaction module (compactd kernel thread) to merge large blocks of contiguous physical memory. The default value of watermark_boost_factor is 15000, meaning the original high watermark is temporarily raised to 150%. Setting this to 0 disables the mechanism for temporarily raising zone watermarks&lt;sup id="fnref:10"&gt;&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref"&gt;10&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;oom
 &lt;div id="oom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The OOM Killer is a kernel module, not a process.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;panic_on_oom&lt;/td&gt;
 &lt;td&gt;Controls system behavior when OOM occurs: &lt;strong&gt;0: Don&amp;rsquo;t trigger panic, start OOM Killer&lt;/strong&gt; 1: Trigger panic and halt 2: Trigger panic then attempt memory release&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_kill_allocating_task&lt;/td&gt;
 &lt;td&gt;Whether to preferentially kill the process that triggered OOM (rather than traversing the process tree to select the optimal target): 0: Disabled 1: Enabled&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_dump_tasks&lt;/td&gt;
 &lt;td&gt;Whether to dump all task information when OOM occurs (for post-mortem analysis): 0: Disabled 1: Enabled&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;oom_score
 &lt;div id="oom_score" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oom_score" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When OOM occurs, the system needs to decide which process to kill based on the OOM score. Each user process has 3 OOM score interface files:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_adj
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r--r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_score_adj&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;oom_score is a dynamically calculated OOM score by the system, influenced at least by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many child processes: +points&lt;/li&gt;
&lt;li&gt;Long-running: -points&lt;/li&gt;
&lt;li&gt;Low nice value: +points (nice value represents process CPU time slice priority. Lower nice values mean higher priority, more CPU time slice allocation)&lt;/li&gt;
&lt;li&gt;Direct hardware access: -points&lt;sup id="fnref:11"&gt;&lt;a href="#fn:11" class="footnote-ref" role="doc-noteref"&gt;11&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to the Linux-calculated OOM score, adjustments (adj) can be manually applied. oom_adj is from earlier Linux kernel versions; it&amp;rsquo;s best to adjust OOM scores through the oom_score_adj interface file.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter/File&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;th&gt;Example Values&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_score&lt;/td&gt;
 &lt;td&gt;Kernel-calculated raw score (dynamic)&lt;/td&gt;
 &lt;td&gt;0~1000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_score_adj&lt;/td&gt;
 &lt;td&gt;User-defined adjustment value, directly affects final score&lt;/td&gt;
 &lt;td&gt;-1000~1000; -1000 equivalent to disabling OOM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_adj (legacy)&lt;/td&gt;
 &lt;td&gt;Legacy adjustment parameter, range -17~15&lt;/td&gt;
 &lt;td&gt;-17~15&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;lowmem_reserve_ratio
 &lt;div id="lowmem_reserve_ratio" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lowmem_reserve_ratio" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides &lt;code&gt;min_free_kbytes&lt;/code&gt;, there&amp;rsquo;s another minimum memory reserve parameter that can cause process memory allocation failures, but their functions differ significantly.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lowmem_reserve_ratio&lt;/code&gt; is a key kernel parameter used to protect low-end memory (DMA, DMA32) from being excessively consumed by high-end memory allocation requests. lowmem_reserve_ratio is just a coefficient, not a directly usable number; the kernel calculates the reserved page count for each zone.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Default values below&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/sys/vm/lowmem_reserve_ratio 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;256&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;256&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Memory zones are ordered by priority from low to high: DMA → DMA32 → Normal → HighMem. Allocation requests from higher-priority zones can &amp;ldquo;borrow&amp;rdquo; memory from lower-priority zones, but must reserve a certain proportion of memory for use by the lower-priority zones.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Node 0|protection|free&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;3976&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 2484, 386430, 386430&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;415741&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 0, 383946, 383946&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;5658528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 0, 0, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For example, DMA&amp;rsquo;s protection indicates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;0: Allocation from this zone, no cross-zone allocation restrictions&lt;/li&gt;
&lt;li&gt;2484: Pages DMA reserves for DMA32 zone allocations&lt;/li&gt;
&lt;li&gt;386430: Pages DMA reserves for Normal zone allocations&lt;/li&gt;
&lt;li&gt;386430: Reserved extension field, meaningless in this context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When DMA32 zone requests memory from DMA zone, 3976 &amp;gt; 2484, it may succeed&lt;/li&gt;
&lt;li&gt;When Normal zone requests memory from DMA zone, 3976 &amp;lt; 386430, it will not succeed&lt;/li&gt;
&lt;li&gt;Requests from lower zones to higher zones are not subject to this restriction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;misc
 &lt;div id="misc" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#misc" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A few more related parameters; those with less relevance are not listed:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;nr_hugepages&lt;/td&gt;
 &lt;td&gt;Number of huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;nr_overcommit_hugepages&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Overcommit of huge pages; The maximum is nr_hugepages + nr_overcommit_hugepages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;nr_hugepages_mempolicy&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;NUMA-localized huge page allocation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;hugetlb_shm_group&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Shared memory permission control&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;hugetlb_optimize_vmemmap&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Restructure huge page metadata management model, reducing memory usage of huge page metadata (struct page). Supported since Linux kernel 5.13&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_map_count&lt;/td&gt;
 &lt;td&gt;Limits the maximum number of memory mapping regions (VMA) a single process can have, default 65530&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zone_reclaim_mode&lt;/td&gt;
 &lt;td&gt;Memory reclamation policy under NUMA, e.g., allocating memory from other nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;stat_interval&lt;/td&gt;
 &lt;td&gt;VM stat refresh frequency, default 1 second&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vfs_cache_pressure&lt;/td&gt;
 &lt;td&gt;Parameter for VFS (Virtual File System) cache reclamation pressure, mainly affecting the aggressiveness of kernel reclaiming dentry and inode caches&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;page-cluster&lt;/td&gt;
 &lt;td&gt;Swap readahead, swaps multiple pages to swap partition at once. Default 3, i.e., 8 pages at once&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 class="relative group"&gt;OS Memory Observation and Calculation
 &lt;div id="os-memory-observation-and-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#os-memory-observation-and-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;/proc/meminfo, /proc/vmstat, /proc/zoneinfo all contain memory information, much of it duplicative. I won&amp;rsquo;t list the differences — a glance tells you what&amp;rsquo;s what.&lt;/p&gt;

&lt;h3 class="relative group"&gt;free available Calculation (Unfinished)
 &lt;div id="free-available-calculation-unfinished" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#free-available-calculation-unfinished" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;General direction: (NR_FREE_PAGES + NR_FILE_PAGES - NR_SHMEM + NR_SWAP_PAGES + NR_SLBA_RECLAIMABLE - TOTALRESERVE_PAGES - root reserved memory)&lt;/p&gt;
&lt;p&gt;The kernel has its own estimated available memory. Directly calculating the available value using a formula is difficult to get exactly right:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Not very accurate, don&amp;#39;t use&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;MemFree|Active\(file\)|Inactive\(file\)|SwapFree|SReclaimable|nr_shmem|Shmem&amp;#34;&lt;/span&gt; |awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2} NR==5 {e=$2} NR==6 {f=$2 ;print (a+b+c+d-e+f)}&amp;#39;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;MemAvailable&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;inactive_anon + active_anon != anon
 &lt;div id="inactive_anon--active_anon--anon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inactive_anon--active_anon--anon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Why?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Primary: Shmem separately counts shared memory pages. nr_anon_pages does not include shared memory pages, while nr_inactive_anon and nr_active_anon include anonymous shared memory pages&lt;/li&gt;
&lt;li&gt;Secondary: anon includes some Unevictable pages (Mlocked is a subset of Unevictable)&lt;/li&gt;
&lt;li&gt;Other minor statistical differences have little impact&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A rough but relatively accurate formula: nr_inactive_anon + nr_active_anon + nr_unevictable - nr_shmem&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Applicable under huge pages; not applicable under NUMA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## /proc/meminfo, /proc/zoneinfo, /proc/vmstat can all be used for calculation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#/proc/vmstat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_computed : &amp;#34;&lt;/span&gt;;cat /proc/vmstat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_inactive_anon|nr_active_anon|nr_unevictable|nr_shmem&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2; print (a+b+c-d)}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_real : &amp;#34;&lt;/span&gt;;cat /proc/vmstat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_anon_pages&amp;#34;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_computed : &lt;span style="color:#ae81ff"&gt;15776924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_real : &lt;span style="color:#ae81ff"&gt;15772671&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;##/proc/zoneinfo Normal&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_normal_computed : &amp;#34;&lt;/span&gt;; cat /proc/zoneinfo |grep Normal -A 50|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_inactive_anon|nr_active_anon|nr_unevictable|nr_shmem&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2; print (a+b+c-d)}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_normal_real : &amp;#34;&lt;/span&gt;; cat /proc/zoneinfo |grep Normal -A 50|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_anon_pages&amp;#34;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_normal_computed : &lt;span style="color:#ae81ff"&gt;15711170&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_normal_real : &lt;span style="color:#ae81ff"&gt;15707402&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;cache Calculation
 &lt;div id="cache-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cache-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The buff/cache shown in the free command can be calculated from file pages or cache itself:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;filepage+shmem: &amp;#34;&lt;/span&gt;;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers|Active\(file\)|Inactive\(file\)|Shmem|SReclaimable&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2} NR==5 {e=$2 ;print (a+b+c+d+e)}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;cached: &amp;#34;&lt;/span&gt;;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers|Cached|SReclaimable&amp;#34;&lt;/span&gt; | awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2 ;print (a+b+c)}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free -k;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Execution results:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;filepage+shmem: &lt;span style="color:#ae81ff"&gt;289417584&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cached: &lt;span style="color:#ae81ff"&gt;289419156&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total used free shared buff/cache available
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: &lt;span style="color:#ae81ff"&gt;395721236&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;79633516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26668564&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84704912&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;289419156&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;178501152&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;5242876&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5242876&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Controversy: Does shmem Count as cache?
 &lt;div id="controversy-does-shmem-count-as-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#controversy-does-shmem-count-as-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Clearly, the calculation above includes shmem in cache. Theoretically, shmem shouldn&amp;rsquo;t be part of cache.&lt;/p&gt;
&lt;p&gt;In fact, the kernel community has discussed this&lt;a href="https://lore.kernel.org/all/YS0Eq&amp;#43;tNe4Pr7O0X@casper.infradead.org/T/" target="_blank" rel="noreferrer"&gt;Why is Shmem included in Cached in /proc/meminfo?&lt;/a&gt;, wanting to remove shared memory from cache:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;	cached &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;global_node_page_state&lt;/span&gt;(NR_FILE_PAGES) &lt;span style="color:#f92672"&gt;-&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;			&lt;span style="color:#a6e22e"&gt;total_swapcache_pages&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.bufferram;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;	cached &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;global_node_page_state&lt;/span&gt;(NR_FILE_PAGES) &lt;span style="color:#f92672"&gt;-&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;			&lt;span style="color:#a6e22e"&gt;total_swapcache_pages&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;			&lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.bufferram &lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.sharedram;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;But modifying this involves forward compatibility concerns. The question comes down to: which is more important — forward compatibility or improving the accuracy of a piece of information?&lt;/p&gt;
&lt;p&gt;Currently, there&amp;rsquo;s no good resolution; that&amp;rsquo;s the status quo.&lt;/p&gt;
&lt;p&gt;The email thread also discusses some interesting things:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Another point of view is that everything in tmpfs is part of the page
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cache and can be written out to swap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Dirty: total amount of RAM used to buffer data to be written on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;permanent storage (dirty). Gets converted to Cached when write is
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;complete. (Actually I would call this &amp;#34;Buffers&amp;#34; but Dirty is okay, too.)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Cached: total amount of RAM used to improve *performance* that can be
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;*immediately dropped* without any data-loss – note that this includes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;all untouched RAM backed by swap.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Shared: total amount of RAM shared between multiple process that
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cannot be freed even if any single process gets killed. (If this is even
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;possible to know - note that this would *only* contain COW pages in
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;practice. We already have Committed_AS which is about as good for real
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world heuristics.)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;cache does not include dirty pages, and can be directly dropped without data loss&lt;/li&gt;
&lt;li&gt;tmpfs is swapout&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shared memory appears to be swapout, which is clearly different from cache pages that can be directly dropped. PostgreSQL&amp;rsquo;s shared memory clearly cannot be directly dropped.&lt;/p&gt;
&lt;p&gt;So for PostgreSQL, the fact that cache contains shared memory is quite important — don&amp;rsquo;t assume by default that it doesn&amp;rsquo;t.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Page Statistics Often Don&amp;rsquo;t Add Up
 &lt;div id="memory-page-statistics-often-dont-add-up" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-page-statistics-often-dont-add-up" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When calculating memory pages, some calculations don&amp;rsquo;t add up. Summary of reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shmem is counted in cache&lt;/li&gt;
&lt;li&gt;Cannot see file-mapped and anonymous-mapped pages within shmem&lt;/li&gt;
&lt;li&gt;nr_anon_pages does not include shared memory pages, while nr_inactive_anon and nr_active_anon include anonymous shared memory pages&lt;/li&gt;
&lt;li&gt;VM and cgroup have slightly different statistical scopes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;cgroup v1
 &lt;div id="cgroup-v1" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-v1" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;cgroup Memory Management
 &lt;div id="cgroup-memory-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-memory-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;cgroup can observe and limit the usage of anonymous pages, file pages, swap cache, and kernel memory. Each memcg has its own independent LRU; there is no concept of a GLOBAL LRU.&lt;/p&gt;
&lt;p&gt;cgroup memory management differs from cgroup CPU management. A task can request lots of CPU work; reaching the cgroup CPU limit can extend execution time to handle it. However, the memory a task occupies is working memory — a task uses the same physical memory.&lt;/p&gt;
&lt;p&gt;Key differences between cgroup CPU and memory management:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memory must be managed through reuse and reclamation; a task&amp;rsquo;s working memory is truly occupied and cannot be used by other tasks. CPU is managed through time allocation; other tasks or cgroups can use it.&lt;/li&gt;
&lt;li&gt;Memory needs to be instantly available; CPU works through time slicing — time can be dispersed.&lt;/li&gt;
&lt;li&gt;CPU control&amp;rsquo;s core is time allocation; Memory Control&amp;rsquo;s core is page counting.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;The core of the design is a counter called the page_counter. The
page_counter tracks the current memory usage and limit of the group of
processes associated with the controller&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Memory Control&amp;rsquo;s core is page counting, meaning it&amp;rsquo;s not that physical pages are statically assigned. The memory allocated this time, when released back to free after use, most likely won&amp;rsquo;t be the same physical page next time&lt;sup id="fnref:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Physical pages know which cgroup they belong to:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				+--------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				| mem_cgroup |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				| &lt;span style="color:#f92672"&gt;(&lt;/span&gt;page_counter&lt;span style="color:#f92672"&gt;)&lt;/span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				+--------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 / ^ &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				/ | &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ | +---------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | mm_struct | |.... | mm_struct |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ | +---------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; + --------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ +------+--------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | page +----------&amp;gt; page_cgroup|
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ +---------------+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;mm_struct represents virtual memory. Each virtual memory knows which cgroup it belongs to; each physical page can point to page_cgroup, meaning it knows which cgroup this physical memory belongs to&lt;sup id="fnref1:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;cgroup Parameters and Metrics
 &lt;div id="cgroup-parameters-and-metrics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-parameters-and-metrics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;cgroup uses interface files for configuration and viewing memory usage.&lt;/p&gt;
&lt;p&gt;Directory: &lt;code&gt;cd /sys/fs/cgroup/memory/xxx/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Kernel memory and mem+swap can have separate settings or usage viewing:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;memory.kmem.xxx #kernel mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;memory.memsw.xxx #mem+swap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Below, we only look at mem-related items.&lt;/p&gt;
&lt;p&gt;Interface files can be divided into three categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read-only — show usage, permissions: &lt;code&gt;-r--r--r--&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Read-write — control parameters, permissions: &lt;code&gt;-rw-r--r--&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Other — special settings, permissions: other&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specific meanings are as follows, with important parameters highlighted:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Interface File&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;memory.numa_stat&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;NUMA-dimensional memory stats&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;memory.stat&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Important&lt;/strong&gt;, the primary memory usage interface file with many metrics; analyzed separately below&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;memory.usage_in_bytes&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;usage_in_bytes is affected by the method and doesn&amp;rsquo;t show &amp;rsquo;exact&amp;rsquo; value of memory. Not recommended for viewing cgroup memory usage&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;memory.failcnt&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Number of times memory usage exceeded &lt;code&gt;memory.limit_in_bytes&lt;/code&gt;, cumulative&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;cgroup.clone_children&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls whether child cgroups inherit parent configuration&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;cgroup.procs&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Used to manage process groups (process IDs, PIDs) in the current cgroup. &lt;strong&gt;For multi-process PostgreSQL, this means writing all PG processes, including management processes and backends, into the &lt;code&gt;procs&lt;/code&gt; file&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tasks&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Used to manage threads (thread IDs, TIDs) in the current cgroup. When writing a process PID to &lt;code&gt;cgroup.procs&lt;/code&gt;, all its thread TIDs are automatically added to &lt;code&gt;tasks&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;notify_on_release&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls whether a release operation is triggered when the last task (process or thread) in the cgroup exits. Would only be enabled for container management; traditional cgroup management keeps it disabled by default. Cgroups should be preserved after database restart&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.move_charge_at_immigrate&lt;/td&gt;
 &lt;td&gt;Deprecated in v2. Charge attribution rules when migrating cgroups&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.use_hierarchy&lt;/td&gt;
 &lt;td&gt;Whether parent cgroup limits child cgroups&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.limit_in_bytes&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;cgroup memory upper limit&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.soft_limit_in_bytes&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Reclaim the portion exceeding the soft limit&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.max_usage_in_bytes&lt;/td&gt;
 &lt;td&gt;cgroup usage peak, an observation metric&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom_control&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;oom_kill_disable 1 — disable OOM&lt;br&gt;under_oom 0 — whether currently in OOM state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.swappiness&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;cgroup-level swappiness&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;memory.force_empty&lt;/td&gt;
 &lt;td&gt;Write only; writing &lt;code&gt;0&lt;/code&gt; forces release of all cgroup memory&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;cgroup.event_control&lt;/td&gt;
 &lt;td&gt;Event notification interface, listens for memory pressure events, requires programming. Often used with memory.pressure_level&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;memory.pressure_level&lt;/td&gt;
 &lt;td&gt;Memory pressure notification level&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Using a PG instance to explain the meaning of various metrics in memory.stat.&lt;/p&gt;
&lt;p&gt;This PG instance is configured as:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_memory_type&lt;span style="color:#f92672"&gt;=&lt;/span&gt;mmap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_buffers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;64GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;approximately &lt;span style="color:#ae81ff"&gt;800&lt;/span&gt; clients, running&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat memory.stat
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cache &lt;span style="color:#ae81ff"&gt;345587761152&lt;/span&gt; 						 &lt;span style="color:#75715e"&gt;#page cache!!!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#ae81ff"&gt;27332608&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Anonymous and swap cache memory size. Note: differs from OS process RSS; clearly doesn&amp;#39;t include PG shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss_huge &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#of bytes of anonymous transparent hugepages. Note: anonymous huge pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mapped_file &lt;span style="color:#ae81ff"&gt;61491769344&lt;/span&gt; &lt;span style="color:#75715e"&gt;#File shared memory size; includes PG shared memory here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;swap &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#On swap partition&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgpgin &lt;span style="color:#ae81ff"&gt;389395357&lt;/span&gt; &lt;span style="color:#75715e"&gt;#rss+cache charged pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgpgout &lt;span style="color:#ae81ff"&gt;305016672&lt;/span&gt; &lt;span style="color:#75715e"&gt;#rss+cache uncharged pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgfault &lt;span style="color:#ae81ff"&gt;1954040341&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Omitted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgmajfault &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Omitted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inactive_anon &lt;span style="color:#ae81ff"&gt;165728256&lt;/span&gt; &lt;span style="color:#75715e"&gt;#anonymous and swap cache memory on inactive LRU&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_anon &lt;span style="color:#ae81ff"&gt;61549518848&lt;/span&gt; &lt;span style="color:#75715e"&gt;#anonymous and swap cache memory on active LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inactive_file &lt;span style="color:#ae81ff"&gt;138240962560&lt;/span&gt; &lt;span style="color:#75715e"&gt;#file-backed on inactive LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_file &lt;span style="color:#ae81ff"&gt;145658613760&lt;/span&gt; &lt;span style="color:#75715e"&gt;#file-backed memory on active LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;unevictable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Unreclaimable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hierarchical_memory_limit &lt;span style="color:#ae81ff"&gt;408021893120&lt;/span&gt; &lt;span style="color:#75715e"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hierarchical_memsw_limit &lt;span style="color:#ae81ff"&gt;9223372036854771712&lt;/span&gt; &lt;span style="color:#75715e"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_xxx &lt;span style="color:#75715e"&gt;#hierarchical &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Roughly (ignoring swap), cache+rss = inactive_anon+active_anon+inactive_file+active_file.&lt;/p&gt;
&lt;p&gt;These values are quite convoluted. cache+rss doesn&amp;rsquo;t have a straightforward correspondence with [in]active_anon/file, and mapped_file (shared memory) is hard to categorize, making it easy to get confused. Combining various documentation and testing, I hand-rolled the following script:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cginfo_lzl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;shared_mem_mapped : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;shared_mem_anon : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2; print (b + c -a)/1024/1024/1024}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;pagecache : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;cache&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;pagecache_cache-share : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;cache|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2; print (a - b)/1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;n
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;file_total : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_total : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_used_rss+map : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_file+rss+map : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file|rss|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_rss+cache : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|cache&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_anon+file : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file|inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_memsw : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|cache|swap&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;hard_limit : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.limit_in_bytes| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $1 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Database with shared_buffers=2GB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 1.69063
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 1.69828
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 5.94717
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 4.25654
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : 4.24889
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : 3.23096
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 3.2233
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.47219
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Differences Between cgroup RSS and Process RSS
 &lt;div id="differences-between-cgroup-rss-and-process-rss" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#differences-between-cgroup-rss-and-process-rss" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#shared_buffers= 64GB, all PG process RSS sorted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps -eo pid,ppid,rss,args |grep &lt;span style="color:#e6db74"&gt;`&lt;/span&gt;cat $PGDATA/postmaster.pid|head -1&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;|sort -k3 -rn
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97632&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61103720&lt;/span&gt; postgres: lzlinst: checkpointer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97633&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;59045152&lt;/span&gt; postgres: lzlinst: background writer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322820&lt;/span&gt; /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97637&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;85116&lt;/span&gt; postgres: lzlinst: pgsentinel 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97697&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19620&lt;/span&gt; postgres: lzlinst: dbmgr users &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97634&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17932&lt;/span&gt; postgres: lzlinst: walwriter 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;250063&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14508&lt;/span&gt; postgres: lzlinst: dbmon postgres &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97636&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13220&lt;/span&gt; postgres: lzlinst: stats collector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;248777&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11576&lt;/span&gt; postgres: lzlinst: dbmon postgres &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97635&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2980&lt;/span&gt; postgres: lzlinst: autovacuum launcher 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97638&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2376&lt;/span&gt; postgres: lzlinst: logical replication launcher 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97630&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1592&lt;/span&gt; postgres: lzlinst: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;250185&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;39130&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;972&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generally, the PG processes with the highest RSS values are checkpointer and bgwriter, because RSS represents actual memory used, including shared memory, and these two processes that flush shared buffer dirty pages occupy the most. Backends with excessive data queries may also have higher RSS values, but this is usually caused by data extracts or slow full-scan queries.&lt;/p&gt;
&lt;p&gt;Why is postmaster&amp;rsquo;s RSS so small? Because postmaster itself doesn&amp;rsquo;t need to do much shared_buffer operations; it only needs to open up the shared memory virtual address space and fork it for other processes to use.&lt;/p&gt;
&lt;p&gt;PM&amp;rsquo;s child processes have the same shared memory address but not necessarily the same RSS:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97632/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#checkpointer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;61087812&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;31429895&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97633/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#bgwriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;59043388&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;29394787&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97627/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#postmaster&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;2318408&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;1741764&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Above, checkpointer and bgwriter occupy the most RSS, and most of their RSS is shared memory. These two processes almost evenly split the entire actually-used shared memory, while postmaster doesn&amp;rsquo;t use much. PM and all its forked child processes have the same shared memory virtual address.&lt;/p&gt;
&lt;p&gt;But cgroup RSS is only a few tens of MB, far less than process RSS:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/fs/cgroup/memory/lzlinst/memory.stat |egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|mapped_file&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#ae81ff"&gt;88997888&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mapped_file &lt;span style="color:#ae81ff"&gt;52963262464&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can see that PG shared memory is not in the cgroup stat RSS. cgroup RSS doesn&amp;rsquo;t count file pages or shared file pages.&lt;/p&gt;
&lt;p&gt;linux kernel&lt;sup id="fnref2:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Only anonymous and swap cache memory is listed as part of &amp;lsquo;rss&amp;rsquo; stat. This should not be confused with the true &amp;lsquo;resident set size&amp;rsquo; or the amount of physical memory used by the cgroup.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Process vs. cgroup memory statistics differences&lt;sup id="fnref:13"&gt;&lt;a href="#fn:13" class="footnote-ref" role="doc-noteref"&gt;13&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Memory&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Single Process&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Process &lt;code&gt;cgroup(memcg)&lt;/code&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;cache&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;None&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;PageCache&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;mapped_file&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;None&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;file_rss + shmem_rss&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;RSS&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;anon_rss + file_rss ＋ shmem_rss&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;anon_rss&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For PostgreSQL, the RSS in stat does not include file map shared memory. The PG official documentation describes mmap as anonymous shared memory:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Possible values are &lt;code&gt;mmap&lt;/code&gt; (for anonymous shared memory allocated using &lt;code&gt;mmap&lt;/code&gt;), &lt;code&gt;sysv&lt;/code&gt; (for System V shared memory allocated via &lt;code&gt;shmget&lt;/code&gt;)&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;cgroup counts PG mmap memory as mapped_file.&lt;/p&gt;
&lt;p&gt;Observing sysv and huge page scenarios, summary of PG&amp;rsquo;s memory.stat metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RSS in stat does not include file map shared memory. Observation shows that regardless of mmap or sysv, RSS does not contain PG shared memory&lt;/li&gt;
&lt;li&gt;Similarly, rss_huge also does not include file map shared huge page memory. Observation shows that even with huge pages enabled, stat does not contain PG shared memory&lt;/li&gt;
&lt;li&gt;Without huge pages, PG shared memory (mmap or sysv) is all counted under memory.stat mapped_file; with huge pages, it&amp;rsquo;s in none of the stat metrics, including rss_huge&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Where Exactly Is mapped_file?
 &lt;div id="where-exactly-is-mapped_file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-exactly-is-mapped_file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2cf787e36cc8.png" alt="RHEL Memory Usage Patterns" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;mapped_file is in cache, and also in inactive_anon+active_anon&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;mapped_file can also be anonymous; both mmap and sysv are counted here&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Database with shared_buffers=2GB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 1.69063
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 1.69828
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 5.94717
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 4.25654
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : 4.24889
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : 3.23096
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 3.2233
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.47219
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;soft_limit_in_bytes
 &lt;div id="soft_limit_in_bytes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#soft_limit_in_bytes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Soft limit (&lt;code&gt;memory.soft_limit_in_bytes&lt;/code&gt;) is a non-enforced constraint in cgroup memory management. When a cgroup&amp;rsquo;s memory usage exceeds the soft limit, the system does not immediately force memory reclamation. Instead, it will &lt;strong&gt;preferentially reclaim the excess memory&lt;/strong&gt; of that cgroup &lt;strong&gt;when global memory pressure is high&lt;/strong&gt; (e.g., when overall system free memory is insufficient).&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Trigger condition&lt;/strong&gt;: Global memory pressure (e.g., insufficient system free memory).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Call path&lt;/strong&gt;: &lt;code&gt;kswapd&lt;/code&gt; → &lt;code&gt;balance_pgdat&lt;/code&gt; → check cgroup soft limits → trigger reclamation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reclamation target&lt;/strong&gt;: Preferentially reclaim memory pages from cgroups exceeding their soft limits.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+-------------------+ Global memory pressure detection +-------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| kswapd thread | ------------------------------------&amp;gt; | balance_pgdat |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+-------------------+ +-------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Traverse memory zones and check
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Check each cgroup&amp;#39;s soft |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | limit usage |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Trigger reclamation for over-limit cgroups
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Page reclamation (LRU list |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | scanning, etc.) |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The soft_limit_in_bytes mechanism is very similar to high. In v2, soft_limit_in_bytes has been deprecated, replaced by three new parameters: min, low, and high.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Impact of Overselling on pagecache
 &lt;div id="impact-of-overselling-on-pagecache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact-of-overselling-on-pagecache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To be discussed later&lt;/p&gt;

&lt;h3 class="relative group"&gt;cg oom
 &lt;div id="cg-oom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cg-oom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, if sharedbuffer = 1/4 of cg mem, then without counting private memory, pagecache can reach up to 3/4 of cg mem. Generally, normal business private memory usage won&amp;rsquo;t be very high. If cg mem is full, memory can be reclaimed from cg pagecache (this is direct memory reclamation; AliOS has implemented async background reclamation: &lt;a href="https://help.aliyun.com/zh/alinux/user-guide/memcg-backend-asynchronous-reclaim?spm=a2c4g.11186623.0.0.562f42bammLZmK" target="_blank" rel="noreferrer"&gt;Memcg Background Async Reclamation&lt;/a&gt;). So the best way to test cg oom is to use sessions that consume lots of private memory rather than stress testing.&lt;/p&gt;
&lt;p&gt;Test case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Observe score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r--r--r-- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; May &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; 16:39 /proc/63766/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#75715e"&gt;# whichever command you like&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## A SQL that can consume lots of private memory, many union alls create many plan nodes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -d lzldb -tX -c &lt;span style="color:#e6db74"&gt;&amp;#34;create table lzl1(col1 varchar(1));&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -tX -c &lt;span style="color:#e6db74"&gt;&amp;#34;\o sqltext.sql&amp;#34;&lt;/span&gt; -c &lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;SELECT &amp;#39;select col1 from lzl1&amp;#39; || &amp;#39; union all&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;FROM generate_series(1, 100000)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;UNION ALL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;SELECT &amp;#39;select col1 from lzl1;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;FROM generate_series(1, 1);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Adjust stack parameter otherwise SQL will be aborted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -d lzldb -c &lt;span style="color:#e6db74"&gt;&amp;#34;set max_stack_depth=1024000&amp;#34;&lt;/span&gt; -f sqltext.sql&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;cg oom off:&lt;/p&gt;
&lt;p&gt;wchan shows OOM information, even an oom score, but the process won&amp;rsquo;t be killed by the OOM killer&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## vm oom enabled; 0: don&amp;#39;t trigger panic, start OOM Killer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/sys/vm/panic_on_oom 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cg oom disabled; 1: disable oom&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /sys/fs/cgroup/memory/$PGNAME/memory.oom_control
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;under_oom &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -eo user,ppid,pid,state,%cpu,%mem,stime,wchan:14,args,rss,vsz,sig_block |grep &lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt; |grep -v grep 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;870&lt;/span&gt; D 0.0 0.0 10:54 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;7216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2807460&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3417&lt;/span&gt; S 0.0 0.0 10:55 pipe_wait postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;22944&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808540&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13069&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;11944&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13104&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;12224&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14352&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;11680&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/fs/cgroup/memory/$PGNAME/memory.oom_control
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;under_oom &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/97994/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 2.00019
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 2.0023
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 2.0023
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 0.00211334
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 7.99789
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.99789
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Currently, it appears that PG processes may also crash when unable to allocate memory. For example, if walwriter crashes, it can cause all other processes to crash.&lt;/p&gt;
&lt;p&gt;cg oom on:&lt;/p&gt;
&lt;p&gt;User processes are killed due to high OOM score, sent kill -9. Most PG processes crash; postmaster &lt;code&gt;reset_shared()&lt;/code&gt; then automatically restarts other processes. Both message and dmesg show out-of-memory information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cg oom enabled&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg log:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2025-05-29 19:10:45.945 CST,,,198877,,6838374d.308dd,4,,2025-05-29 18:30:37 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;server process (PID 236413) was terminated by signal 9: Killed&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Failed process was running: select col1 from lzl1 union all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;message:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;May 29 19:10:45 lzlhost kernel: Memory cgroup stats for /t1lzldb: cache:8392988KB rss:8384228KB rss_huge:0KB mapped_file:7458316KB swap:0KB inactive_anon:1310184KB active_anon:15467032KB inactive_file:0KB active_file:0KB unevictable:0KB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;May 29 19:10:45 lzlhost kernel: Memory cgroup out of memory: Kill process 236413 (postgres) score 497 or sacrifice child
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;dmesg:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Memory cgroup stats for /t1lzldb: cache:8392988KB rss:8384228KB rss_huge:0KB mapped_file:7458316KB swap:0KB inactive_anon:1310184KB active_anon:15467032KB inactive_file:0KB active_file:0KB unevictable:0KB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Memory cgroup out of memory: Kill process 236413 (postgres) score 497 or sacrifice child
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Killed process 236413 (postgres) total-vm:18828736kB, anon-rss:8328252kB, file-rss:2328kB, shmem-rss:1832kB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Management differences between cg oom on and off for PG databases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;on: cg oom killer will kill processes with high OOM score, typically user processes&lt;/li&gt;
&lt;li&gt;off: cg oom killer won&amp;rsquo;t start. PG processes will hang — they may recover on their own, but PG&amp;rsquo;s critical processes (like walwriter) might crash due to insufficient memory, and the instance may still go down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: this is cg oom, not vm oom. System-level vm oom is determined by the system-level vm overcommit mechanism.&lt;/p&gt;

&lt;h3 class="relative group"&gt;cg v1 Problems
 &lt;div id="cg-v1-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cg-v1-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;No cg pagetable statistics&lt;/li&gt;
&lt;li&gt;No cg slab statistics&lt;/li&gt;
&lt;li&gt;No cg hugepage statistics (hugepages are not charged, not just not counted)&lt;/li&gt;
&lt;li&gt;No cg async/sync page reclamation statistics&lt;/li&gt;
&lt;li&gt;cg RSS and process RSS have different statistical scopes&lt;/li&gt;
&lt;li&gt;shmem statistics are messy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;What&amp;rsquo;s New in V2
 &lt;div id="whats-new-in-v2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-new-in-v2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;V2 Officially released in Linux 4.5 (March 2016)&lt;sup id="fnref:14"&gt;&lt;a href="#fn:14" class="footnote-ref" role="doc-noteref"&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;cgroup v2 memory management improvements and changes:&lt;sup id="fnref:15"&gt;&lt;a href="#fn:15" class="footnote-ref" role="doc-noteref"&gt;15&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;cg mem interface file&lt;/th&gt;
 &lt;th&gt;vs v1&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.current&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Current memory usage. Removes the less useful usage_in_bytes&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.min&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Different from VM&amp;rsquo;s min/low/high&lt;/strong&gt;. VM watermarks are about remaining OS memory; cg v2 watermarks are about cg memory used. memory.min is a hard memory protection value, default 0. Even when the system has no reclaimable memory, memory at or below this boundary won&amp;rsquo;t be reclaimed&lt;sup id="fnref:16"&gt;&lt;a href="#fn:16" class="footnote-ref" role="doc-noteref"&gt;16&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.low&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Best-effort memory protection value, default 0. System preferentially reclaims memory from unprotected cgroups. If still insufficient, reclaims memory between memory.min and memory.low.&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.high&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Memory reclamation warning threshold, default max. When cgroup memory usage reaches high, triggers synchronous memory reclamation for this cgroup and children, trying to keep memory below high&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.max&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Equivalent to memory.limit_in_bytes&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.reclaim&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Active reclamation interface file. v1 only had memory.force_empty for forced clearing&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.peak&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Equivalent to max_usage_in_bytes; exceeding peak triggers cg oom killer&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom.group&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Controls whether cg OOM killer terminates the entire cgroup (1) or just a single process (0). Default 0. If oom_score_adj=-1000, process won&amp;rsquo;t be killed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.events&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Reports memory-related events&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.stat&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Many changes, analyzed separately&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.zswap.current, memory.zswap.max, memory.zswap.writeback&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Zswap is a compressed swap mechanism in the Linux kernel. Through compressing memory pages awaiting swap, it reduces disk I/O operations, improving system performance. Its core idea is to compress swap data that would have been written to disk and temporarily store it in memory, only writing data to physical swap devices (like swap partitions or files) when necessary&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;soft_limit_in_bytes&lt;/td&gt;
 &lt;td&gt;Removed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom_control&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Removed&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;This means v2 cannot directly disable cg oom killer&lt;/strong&gt;; however, fine-grained memory management can be achieved through min/low/high settings and event memory notifications&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;v2 cg mem management advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compared to v1, v2 has simpler and clearer hierarchical management&lt;/li&gt;
&lt;li&gt;v1 only had OOM kill or freeze; v2 has more means to control memory size (such as memory.min/low/high)&lt;/li&gt;
&lt;li&gt;v2 makes it easier to control burst loads&lt;sup id="fnref:17"&gt;&lt;a href="#fn:17" class="footnote-ref" role="doc-noteref"&gt;17&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Removes the interface file for directly disabling cg oom killer&lt;/li&gt;
&lt;li&gt;Adds memory_hugetlb_accounting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;memory.stat:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;strong&gt;Parameter&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Meaning&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;v1 Counterpart&lt;/strong&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;anon&lt;/td&gt;
 &lt;td&gt;Anonymous pages&lt;/td&gt;
 &lt;td&gt;active_anon+inactive_anon&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file&lt;/td&gt;
 &lt;td&gt;File pages, including tmpfs&lt;/td&gt;
 &lt;td&gt;active_file+inactive_file&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;kernel (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Total kernel memory, including kernel_stack, &lt;strong&gt;pagetables&lt;/strong&gt;, percpu, vmalloc, &lt;strong&gt;slab&lt;/strong&gt;, and other kernel memory usage.&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;kernel_stack&lt;/td&gt;
 &lt;td&gt;Memory occupied by kernel stacks.&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pagetables&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;page tables&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sec_pagetables&lt;/td&gt;
 &lt;td&gt;Secondary page tables, suitable for VMs, GPU devices, network acceleration cards, and other hardware resource isolation scenarios&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;percpu (npn)&lt;/td&gt;
 &lt;td&gt;Memory size used for per-cpu kernel data structures&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sock (npn)&lt;/td&gt;
 &lt;td&gt;network transmission buffers&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vmalloc (npn)&lt;/td&gt;
 &lt;td&gt;vmalloc&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;shmem&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Including tmpfs, shm, shared anonymous mmap&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswap&lt;/td&gt;
 &lt;td&gt;Memory consumed by zswap compression itself&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswapped&lt;/td&gt;
 &lt;td&gt;Amount of user memory zswapped&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;file_mapped&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;mmap() size&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Somewhat similar to v1 mapped_file, though mapped_file includes tmpfs, shm&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_dirty&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 dirty&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_writeback&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 writeback&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;swapcached&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 swapcached&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;anon_thp&lt;/td&gt;
 &lt;td&gt;Anonymous pages in transparent huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_thp&lt;/td&gt;
 &lt;td&gt;File pages in transparent huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shmem_thp&lt;/td&gt;
 &lt;td&gt;Transparent huge pages for shm, tmpfs, anonymous mmap&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;inactive_anon, active_anon, inactive_file, active_file, unevictable&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab_reclaimable&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab_unreclaimable&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;workingset_refault_anon, workingset_refault_file, workingset_activate_anon, workingset_activate_file, workingset_restore_anon, workingset_restore_file, workingset_nodereclaim&lt;/td&gt;
 &lt;td&gt;Refaulted page statistics&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pswpin (npn)&lt;/td&gt;
 &lt;td&gt;swap in&lt;/td&gt;
 &lt;td&gt;Same as v1 pgpgin&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pswpout (npn)&lt;/td&gt;
 &lt;td&gt;swap out&lt;/td&gt;
 &lt;td&gt;Same as v1 pgpgout&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgscan (npn)&lt;/td&gt;
 &lt;td&gt;scanned pages (in an inactive LRU list)&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgsteal (npn)&lt;/td&gt;
 &lt;td&gt;Reclaimed memory&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_kswapd (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_direct (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgscan_khugepaged (npn)&lt;/td&gt;
 &lt;td&gt;Pages scanned by the transparent huge page daemon&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_proactive (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Pages scanned proactively&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgsteal_kswapd (npn), pgsteal_direct (npn), pgsteal_khugepaged (npn), pgsteal_proactive (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests; pgsteal\* corresponds to pgscan\*&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgfault (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;Same as v1 pgfault&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgmajfault (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;Same as v1 pgmajfault&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgrefill (npn)&lt;/td&gt;
 &lt;td&gt;Pages scanned in active LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgactivate (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Pages moved to active LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgdeactivate (npn)&lt;/td&gt;
 &lt;td&gt;Pages moved to inactive LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pglazyfree (npn)&lt;/td&gt;
 &lt;td&gt;Pages whose release is deferred when under memory pressure&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pglazyfreed (npn)&lt;/td&gt;
 &lt;td&gt;Reclaimed lazyfree pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;swpin_zero,swpout_zero&lt;/td&gt;
 &lt;td&gt;zero-filled pages; during Swap In, when the kernel detects page content is all zeros (Zero-filled), marks the page as &amp;ldquo;zero page&amp;rdquo; in metadata, skipping disk I/O&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswpin,zswpout,zswpwb&lt;/td&gt;
 &lt;td&gt;zswap-related pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;thp_fault_alloc (npn), thp_collapse_alloc (npn), thp_swpout (npn), thp_swpout_fallback (npn)&lt;/td&gt;
 &lt;td&gt;Transparent huge page-related pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;numa_pages_migrated (npn), numa_pte_updates (npn), numa_hint_faults (npn)&lt;/td&gt;
 &lt;td&gt;NUMA-related pages; also memory.numa_stat exists&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgdemote_kswapd, pgdemote_direct, pgdemote_khugepaged, pgdemote_proactive&lt;/td&gt;
 &lt;td&gt;Unclear what demote means&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;hugetlb&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;v2 cg mem observation advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Adds slab, pagetable, pgscank/pgscand/pgsteal, and huge page info — none of which v1 had&lt;/li&gt;
&lt;li&gt;More observation metrics related to specific features, such as sock, vmalloc, transparent huge pages, zswap compression interactions, swap_zero zero-fill interactions, etc.&lt;/li&gt;
&lt;li&gt;Shared memory shmem and file_mapped metrics are separated&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;wchan
 &lt;div id="wchan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wchan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Waiting Channel, name of the kernel function in which the process is sleeping&lt;/p&gt;
&lt;p&gt;Generally, you should check the wchan of processes in D state to see what kernel function the process is waiting on.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-&lt;/code&gt;: Running tasks will display a dash (&amp;rsquo;-&amp;rsquo;) in this column&lt;/p&gt;
&lt;p&gt;&lt;code&gt;poll_schedule_timeout&lt;/code&gt;: Common for PM, usually in running state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zz ***Fri May &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 04:50:10 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.4 &lt;span style="color:#ae81ff"&gt;70585180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322876&lt;/span&gt; poll_schedule_timeout S 21:06:18 00:02:40 /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Fri May &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 04:50:43 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.4 &lt;span style="color:#ae81ff"&gt;70585180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322876&lt;/span&gt; - R 21:06:18 00:02:42 /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;futex_wait_queue_me&lt;/code&gt;: Common for SLEEP processes. Occasionally D state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;455358&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 4.7 1.0 &lt;span style="color:#ae81ff"&gt;70590684&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5349576&lt;/span&gt; futex_wait_queue_me S 03:01:12 00:02:47 postgres: t1lzldb: lzl test3 30.181.32.3&lt;span style="color:#f92672"&gt;(&lt;/span&gt;39801&lt;span style="color:#f92672"&gt;)&lt;/span&gt; COMMIT&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;hugetlb_fault&lt;/code&gt;: Only seen when huge pages are first loaded and load starts up&lt;/p&gt;
&lt;p&gt;&lt;code&gt;do_last&lt;/code&gt;: Function in the VFS (Virtual File System) path resolution logic, responsible for handling the last component of a file path (such as filename or symbolic link) and triggering actual file operations&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lock_page_killable&lt;/code&gt;: Lock a physical memory page in an interruptible manner. &amp;ldquo;Interruptible&amp;rdquo; means the process is allowed to respond to fatal signals like &lt;code&gt;SIGKILL&lt;/code&gt; while waiting for the page lock&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rpc_wait_bit_killable&lt;/code&gt;: This function relates to the Remote Procedure Call (RPC) mechanism, used in the kernel to wait for changes to certain bit flags&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wait_on_page_bit&lt;/code&gt;: Wait for changes to page flag states (e.g., PG_locked, PG_writeback)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;blkdev_issue_flush&lt;/code&gt;: Block device layer cache flush function. Possible call chain: user calls &lt;code&gt;fsync()&lt;/code&gt; → file system (e.g., ext4) submits relevant dirty pages to the block device layer → calls &lt;code&gt;blkdev_issue_flush()&lt;/code&gt; to ensure device cache is flushed&lt;/p&gt;
&lt;p&gt;&lt;code&gt;on_proc_exit&lt;/code&gt;: Register cleanup functions for process exit&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ima_file_check&lt;/code&gt;: Belongs to the IMA (Integrity Measurement Architecture) subsystem, used to verify file integrity during file access; typically involved with &lt;code&gt;open()&lt;/code&gt; calls&lt;/p&gt;
&lt;p&gt;&lt;code&gt;flush_work&lt;/code&gt;: Wait for task completion&lt;/p&gt;
&lt;p&gt;&lt;code&gt;call_rwsem_down_write_failed&lt;/code&gt;: When attempting to acquire a write lock (&lt;code&gt;down_write()&lt;/code&gt;) fails, this function handles write lock contention and waiting logic. It uses spin or sleep mechanisms to make the current process wait for lock release (rwsem: read-write semaphore)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;get_request&lt;/code&gt;: &lt;strong&gt;Appears when iowait is high&lt;/strong&gt;. Gets a free request structure (&lt;code&gt;struct request&lt;/code&gt;) from the block device request queue. If the queue is full (device processing speed insufficient), the thread waits until a request is available&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lookup_slow&lt;/code&gt;: Slow path for VFS (Virtual File System) path resolution&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/**
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * lookup_fast - do fast lockless (but racy) lookup of a dentry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * @nd: current nameidata
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Do a fast, but racy lookup in the dcache for the given dentry, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * revalidate it. Returns a valid dentry pointer or NULL if one wasn&amp;#39;t
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * found. On error, an ERR_PTR will be returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;lookup_fast&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; nameidata &lt;span style="color:#f92672"&gt;*&lt;/span&gt;nd)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Fast lookup failed, do it the slow way */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;__lookup_slow&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; qstr &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; flags)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;lookup_slow&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; qstr &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; flags)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; inode &lt;span style="color:#f92672"&gt;*&lt;/span&gt;inode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; dir&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_inode;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;res;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;inode_lock_shared&lt;/span&gt;(inode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	res &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;__lookup_slow&lt;/span&gt;(name, dir, flags);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;inode_unlock_shared&lt;/span&gt;(inode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; res;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lookup_fast and lookup_slow both search for dentries and return them. lookup_fast searches in the dentry cache; if it fails, lookup_slow is used.&lt;/p&gt;
&lt;p&gt;Stress testing with huge pages enabled, no direct memory reclamation, the following events occurred:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lock_page&lt;/code&gt;: &lt;strong&gt;Appears when iowait is high&lt;/strong&gt;. When the kernel attempts to lock a memory page, if the page is already locked by another thread/process, the current thread enters a waiting state.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vx_svar_sleep_unlock&lt;/code&gt;, &lt;code&gt;vx_ilock&lt;/code&gt;, &lt;code&gt;vx_bc_biowait&lt;/code&gt;, &lt;code&gt;vx_dio_physio&lt;/code&gt;, &lt;code&gt;vx_rwsleep_lock&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;vx is a journaling &lt;strong&gt;file system&lt;/strong&gt; developed by Veritas (now owned by Symantec and subsequently spun off as Veritas Technologies), designed for high-performance, high-availability large-scale data storage, &lt;strong&gt;primarily targeting enterprise application scenarios&lt;/strong&gt;. Like xfs and ext4, it is a type of file system.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pipe_wait&lt;/code&gt;: When a process attempts to read from or write to a pipe, if the pipe buffer is full (write operation) or empty (read operation), the current thread enters sleep state, waiting for buffer state changes&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pipe_write&lt;/code&gt;: Entry function for pipe write operations. When the buffer is full, the thread sleeps in this function, waiting for writable space&lt;/p&gt;
&lt;p&gt;&lt;code&gt;congestion_wait&lt;/code&gt;: When the block device I/O queue is congested (e.g., request queue full or device processing delayed), the kernel uses this function to briefly sleep the thread&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wait_iff_congested&lt;/code&gt;: Checks whether the block device queue is congested and enters brief sleep if so. Similar to &lt;code&gt;congestion_wait&lt;/code&gt; but more lightweight, typically used in memory reclamation or dirty page writeback paths&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mem_cgroup_oom_synchronize&lt;/code&gt;: When &lt;code&gt;usage_in_bytes&lt;/code&gt; reaches &lt;code&gt;limit_in_bytes&lt;/code&gt;, marks oom_control.under_oom=1. Whether the OOM killer kernel module is activated depends on oom_control.oom_kill_disable&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mem_cgroup_oom&lt;/code&gt;: Same as &lt;code&gt;mem_cgroup_oom_synchronize&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;rmap_walk
 &lt;div id="rmap_walk" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rmap_walk" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;One of PFRA&amp;rsquo;s goals is to reclaim shared page frames. To achieve this, the Linux 2.6 kernel can quickly locate all page table entries pointing to the same page frame — this process is called reverse mapping[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)].&lt;/p&gt;
&lt;p&gt;When a page frame already referenced by one process is inserted into another process&amp;rsquo;s page table entries (fork), rmap_walk should also occur&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zcat hostlzl_ps_25.04.08.0900.dat.gz|egrep &lt;span style="color:#e6db74"&gt;&amp;#34;\-D /dirlzl/pg5998/data|zzz&amp;#34;&lt;/span&gt;|less
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:10:50 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:01:56 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:11:20 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:01:56 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:13:08 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; - D 22:17:21 00:01:57 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;225076&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 1.6 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; rmap_walk D 09:11:51 00:00:01 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;224924&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.7 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1728&lt;/span&gt; rmap_walk D 09:11:46 00:00:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;224817&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; try_to_unmap_file D 09:11:44 00:00:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:19:16 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.3 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117884&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:02:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;250875&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.0 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2208&lt;/span&gt; - R 09:19:17 00:00:00 /dirlzl/postgres/base/postgressqlbin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:19:48 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.3 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117884&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:02:01 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;try_to_unmap_file
 &lt;div id="try_to_unmap_file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#try_to_unmap_file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The try_to_unmap_file() function calls try_to_unmap_cluster(), and try_to_unmap_cluster() scans all page table entries corresponding to linear addresses in that linear region, attempting to clear them[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. try_to_unmap_file() performs reverse mapping of mapped pages. Note: reverse mapping means finding all VMAs through the page table and reclaiming shared physical page frames.&lt;/p&gt;

&lt;h3 class="relative group"&gt;page_referenced
 &lt;div id="page_referenced" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page_referenced" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;referenced and active are used to control page activity level and are used in page reclamation. When refcount=0, it indicates free pages or pages about to be released[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)].&lt;/p&gt;
&lt;p&gt;In kernel.org doc&amp;rsquo;s Object-Based Reverse Mapping, there is a description of the page_referenced() function&lt;sup id="fnref1:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt; which checks all PTEs that map a page to see if the page has been referenced recently&lt;/p&gt;
&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt; calls &lt;code&gt;page_referenced_obj()&lt;/code&gt; which is the top level function for finding all PTEs within VMAs that map the page.&lt;/p&gt;
&lt;p&gt;If a page is mapped and it is referenced through the mapping, index hash table, this bit is set. It is used during page replacement for moving the page around the LRU lists&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In short, page_referenced() finds all PTEs&amp;rsquo; VMAs that map a page through the page frame. This is also a reverse mapping process.&lt;/p&gt;
&lt;p&gt;Linux introduced two page flags, &lt;code&gt;PG_active&lt;/code&gt; and &lt;code&gt;PG_referenced&lt;/code&gt;, to identify the activity level of pages, thereby deciding how to move pages between two lists (active LRU and inactive LRU).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4fd31681c3a0.png" alt="pic" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PG_active&lt;/code&gt; is used to indicate whether the page is currently active — if this bit is set, the page is active. &lt;code&gt;PG_referenced&lt;/code&gt; is used to indicate whether the page has been accessed recently — each time the page is accessed, this bit is set.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt;: &lt;strong&gt;When the operating system performs page reclamation&lt;/strong&gt;, each time a page is scanned, this function is called to set the page&amp;rsquo;s &lt;code&gt;PG_referenced&lt;/code&gt; bit. If a page&amp;rsquo;s &lt;code&gt;PG_referenced&lt;/code&gt; bit is set but the page is not accessed again within a certain time, its &lt;code&gt;PG_referenced&lt;/code&gt; bit will be cleared.&lt;sup id="fnref:18"&gt;&lt;a href="#fn:18" class="footnote-ref" role="doc-noteref"&gt;18&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Observation Metrics
 &lt;div id="memory-observation-metrics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-observation-metrics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;View basic memory settings:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7c22bffd37cf.png" alt="image.png" /&gt;
Observe memory metrics:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4fed9a1d93d7.png" alt="image.png" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Some Questions
 &lt;div id="some-questions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#some-questions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Do kswapd and Direct Memory Reclamation Execute Together?
 &lt;div id="do-kswapd-and-direct-memory-reclamation-execute-together" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#do-kswapd-and-direct-memory-reclamation-execute-together" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Yes. If it&amp;rsquo;s watermark-triggered memory reclamation, pgscand is often accompanied by pgscank; the reverse is not necessarily true. If both pgscank and pgscand are frequent, consider adjusting memory reclamation watermarks, increasing the delta to prevent it from being quickly breached.&lt;/p&gt;
&lt;p&gt;However, there&amp;rsquo;s another case: when fragmentation rate is high and free memory is still plentiful, blocking memory compaction may be directly triggered with pgscand but no pgscank at all. In this case, adjusting watermarks won&amp;rsquo;t help. Consider enabling huge page memory and increasing shared buffer hit rate to reduce frequent pagecache allocation that fragments memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Impact of Oversized pagetable on Memory Reclamation
 &lt;div id="impact-of-oversized-pagetable-on-memory-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact-of-oversized-pagetable-on-memory-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;An oversized pagetable increases the cost and time of reverse mapping. During direct memory reclamation, reverse mapping is needed to find all processes&amp;rsquo; virtual address spaces (VMAs), then cancel the VMA page table mappings of all processes. This means: the more processes, the larger the pagetable, and the slower the memory reclamation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The more PostgreSQL processes, the larger the pagetable; the larger shared buffer, the larger the pagetable.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Enabling huge page memory can reduce pagetable size by 500x (4k=&amp;gt;2M), not only freeing up memory but also improving memory reclamation efficiency.&lt;/p&gt;

&lt;h3 class="relative group"&gt;How Large Should shared buffers Be?
 &lt;div id="how-large-should-shared-buffers-be" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-large-should-shared-buffers-be" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;sharedbuffers = 1/4 cgmem seems to have become an industry standard, but the actual situation is far more complex. Theoretically, reducing sharedbuffers a bit can increase pagecache a bit, actually slightly increasing total cache size. Increasing sharedbuffers a bit slightly reduces total cache size but improves sharedbuffer hit rate somewhat. Clearly, making sharedbuffers too large is bad, and making it too small is also bad. If sharedbuffers is too small, PG&amp;rsquo;s own working memory becomes too small, effectively offloading memory management to the OS — OS pagecache reclamation will also affect performance. If sharedbuffers is too large, not only is pagecache squeezed, but PG&amp;rsquo;s dirty page flushing impact must also be considered, especially for write-heavy scenarios where corresponding bgwriter parameters need adjustment.&lt;/p&gt;
&lt;p&gt;From rough stress testing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Without huge pages, shared buffers = min(1/4 MEM, 20GB)&lt;/li&gt;
&lt;li&gt;With huge pages, shared buffers = min(1/4 MEM, 60GB)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Is the Difference Between Processes and Threads Really Not That Big?
 &lt;div id="is-the-difference-between-processes-and-threads-really-not-that-big" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#is-the-difference-between-processes-and-threads-really-not-that-big" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Any Linux kernel material will say that the difference between processes and threads is not significant. Whether creating a process or a thread, the kernel uses the same function, kernel_clone, to implement it. The only difference lies in the parameters passed. The fork and clone system calls are roughly the same[^ 《深入理解Linux进程和内存》 (Understanding Linux Processes and Memory)]:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fdb4c651f034.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;strong&gt;Dimension&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Process&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Thread&lt;/strong&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;childID&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has an independent &lt;code&gt;pid&lt;/code&gt; (process ID)&lt;/td&gt;
 &lt;td&gt;Each thread has a &lt;code&gt;tid&lt;/code&gt; (thread ID), but the thread&amp;rsquo;s &lt;code&gt;pid&lt;/code&gt; is the same as its process&amp;rsquo;s &lt;code&gt;pid&lt;/code&gt;.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Address Space&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has an independent address space (&lt;code&gt;mm_struct&lt;/code&gt;), including memory, stack, etc.&lt;/td&gt;
 &lt;td&gt;Threads share the address space of their process; all threads&amp;rsquo; &lt;code&gt;mm_struct&lt;/code&gt; points to the same address space.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;File System&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has its own &lt;code&gt;fs_struct&lt;/code&gt;, including file descriptors, mount points, etc.&lt;/td&gt;
 &lt;td&gt;Threads share their process&amp;rsquo;s &lt;code&gt;fs_struct&lt;/code&gt;; all threads&amp;rsquo; file descriptors and mount points are the same as the process.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Compared to processes, threads are only slightly &amp;ldquo;lighter&amp;rdquo;. Overall, the similarities between processes and threads outweigh their differences.&lt;/p&gt;
&lt;p&gt;However, when the number of processes increases, the difference becomes significant, especially for multi-process applications like PostgreSQL:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each process has its own VMA, so more address spaces need to be maintained&lt;/li&gt;
&lt;li&gt;Each process has its own pagetable, so pagetables consume more memory&lt;/li&gt;
&lt;li&gt;Multiple processes increase TLB flush overhead, while threads do not&lt;/li&gt;
&lt;li&gt;Process switching requires more context switch overhead, while threads do not&lt;/li&gt;
&lt;li&gt;Inter-process communication (IPC) is less efficient, while threads can directly share memory without IPC communication issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You could say: &lt;strong&gt;processes and threads don&amp;rsquo;t differ much at creation time, but multi-process management and multi-thread management differ greatly&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Does the Standby Have PG-Level Dirty Pages?
 &lt;div id="why-does-the-standby-have-pg-level-dirty-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-the-standby-have-pg-level-dirty-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The standby&amp;rsquo;s WAL replay mechanism itself generates dirty pages, and the standby also flushes dirty pages. You can view standby dirty pages through pg_buffercache. The standby&amp;rsquo;s dirty pages are different from the primary&amp;rsquo;s — standby dirty data is also just regular relations. You can also observe that the standby&amp;rsquo;s checkpoint/bgwriter/backend dirty flushing is different from the primary&amp;rsquo;s.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Is File Cache Higher on Some Databases and Lower on Others?
 &lt;div id="why-is-file-cache-higher-on-some-databases-and-lower-on-others" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-file-cache-higher-on-some-databases-and-lower-on-others" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Generally, databases with high data dispersion have more file cache. Simple slow SQL queries are unlikely to maintain high file cache levels long-term. A slow SQL query accessing lots of data might briefly raise filecache, but after a while, these file pages&amp;rsquo; reference count drops, becoming inactive file pages, and memory can reclaim this portion. However, frequent data dispersion — such as when an index&amp;rsquo;s correlation approaches 0 (like a UUID primary key) — results in decent SQL performance but high reads, potentially generating frequent physical IO and loading too many pages into filecache. Even changes in business patterns can cause a large amount of shared buffer swapping in and out, significantly impacting performance.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PG Processes and Shared Memory Mapping
 &lt;div id="pg-processes-and-shared-memory-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-processes-and-shared-memory-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Without huge pages: /dev/zero (deleted)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/102208/smaps |egrep &lt;span style="color:#e6db74"&gt;&amp;#34;rw\-s&amp;#34;&lt;/span&gt; -A &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefd8901000-2aefd8902000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;1202061313&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefd8918000-2aefd898f000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:13 &lt;span style="color:#ae81ff"&gt;4084862058&lt;/span&gt; /dev/shm/PostgreSQL.1008001451
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;476&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefe2605000-2b00ad129000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;4084864418&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#With huge pages: /anon_hugepage (deleted)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/29091/smaps |egrep &lt;span style="color:#e6db74"&gt;&amp;#34;rw\-s&amp;#34;&lt;/span&gt; -A &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aaaaac00000-2ac3a2c00000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:0e &lt;span style="color:#ae81ff"&gt;215471503&lt;/span&gt; /anon_hugepage &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;104726528&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b48dfe93000-2b48dfe94000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;88604727&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b48dfeab000-2b48dff22000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;215515747&lt;/span&gt; /dev/shm/PostgreSQL.1123685558
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;476&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Child process page tables are all copied from the parent process; parent and child processes therefore share the same page frames[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. So whether it&amp;rsquo;s the postmaster or backend processes (any process forked from postmaster), they all map the same shared memory address in their virtual memory — their addresses and Size in smaps are equal.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Do All PG Processes Have /dev/zero as the Largest Segment in Virtual Memory?
 &lt;div id="why-do-all-pg-processes-have-devzero-as-the-largest-segment-in-virtual-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-do-all-pg-processes-have-devzero-as-the-largest-segment-in-virtual-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are two main ways to implement anonymous page mapping with mmap: one is by setting the &lt;code&gt;MAP_ANONYMOUS&lt;/code&gt; flag with &lt;code&gt;fd=-1&lt;/code&gt;, and the other is by opening the &lt;code&gt;/dev/zero&lt;/code&gt; device file and passing the resulting file descriptor to &lt;code&gt;mmap&lt;/code&gt;. These two methods are functionally equivalent.&lt;/p&gt;
&lt;p&gt;PG shared buffers use the &lt;code&gt;/dev/zero&lt;/code&gt; device mapping to implement anonymous shared pages, which is why you typically see PG processes having a large proportion of their virtual memory address space as &lt;code&gt;/dev/zero&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;[Understanding the Linux Kernel]: Understanding the Linux Kernel: Memory Addressing, Memory Management, Address Space Management, Page Frame Reclamation&lt;/p&gt;
&lt;p&gt;[Understanding Linux Processes and Memory]: Understanding Linux Processes and Memory: CPU Hardware Principles, Process and Thread Comparison&lt;/p&gt;
&lt;p&gt;[Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition]: Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition: System Calls, Memory Management&lt;/p&gt;
&lt;hr&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&lt;a href="https://www.cs.oslomet.no/~haugerud/os/Forelesning/os7.pdf" target="_blank" rel="noreferrer"&gt;https://www.cs.oslomet.no/~haugerud/os/Forelesning/os7.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://www.cs.unc.edu/~porter/courses/comp630/s24/slides/pfra.pdf" target="_blank" rel="noreferrer"&gt;https://www.cs.unc.edu/~porter/courses/comp630/s24/slides/pfra.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/gorman/html/understand/index.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/index.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://courses.cs.washington.edu/courses/cse333/20wi/lectures/07/CSE333-L07-posix_20wi.pdf" target="_blank" rel="noreferrer"&gt;https://courses.cs.washington.edu/courses/cse333/20wi/lectures/07/CSE333-L07-posix_20wi.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://www.sohu.com/a/392831824_467784" target="_blank" rel="noreferrer"&gt;https://www.sohu.com/a/392831824_467784&lt;/a&gt;&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/configuring-an-operating-system-to-optimize-memory-access_monitoring-and-managing-system-status-and-performance#overview-of-a-systems-memory_configuring-an-operating-system-to-optimize-memory-access" target="_blank" rel="noreferrer"&gt;redhat,Configuringanoperatingsystemtooptimizememoryaccess&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#swappiness" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#swappiness&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:8"&gt;
&lt;p&gt;&lt;a href="https://access.redhat.com/solutions/6785021" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/solutions/6785021&lt;/a&gt;&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:9"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/Documentation/vm/overcommit-accounting" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/Documentation/vm/overcommit-accounting&lt;/a&gt;&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:10"&gt;
&lt;p&gt;&lt;a href="https://carlyleliu.github.io/LinuxKernel/LinuxMemoryOptimization/" target="_blank" rel="noreferrer"&gt;https://carlyleliu.github.io/LinuxKernel/LinuxMemoryOptimization/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:11"&gt;
&lt;p&gt;&lt;a href="https://www.man7.org/linux/man-pages/man5/proc_pid_oom_score.5.html" target="_blank" rel="noreferrer"&gt;https://www.man7.org/linux/man-pages/man5/proc_pid_oom_score.5.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:11" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:12"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref2:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:13"&gt;
&lt;p&gt;&lt;a href="https://wiki.goframe.org/pages/viewpage.action?pageId=157646868" target="_blank" rel="noreferrer"&gt;https://wiki.goframe.org/pages/viewpage.action?pageId=157646868&lt;/a&gt;&amp;#160;&lt;a href="#fnref:13" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:14"&gt;
&lt;p&gt;&lt;a href="https://www.man7.org/conf/lca2019/cgroups_v2-LCA2019-Kerrisk.pdf" target="_blank" rel="noreferrer"&gt;https://www.man7.org/conf/lca2019/cgroups_v2-LCA2019-Kerrisk.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:14" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:15"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:15" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:16"&gt;
&lt;p&gt;&lt;a href="https://support.huaweicloud.com/usermanual-hce/hce_02_0072.html" target="_blank" rel="noreferrer"&gt;https://support.huaweicloud.com/usermanual-hce/hce_02_0072.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:16" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:17"&gt;
&lt;p&gt;&lt;a href="https://chrisdown.name/talks/cgroupv2/cgroupv2-fosdem.pdf" target="_blank" rel="noreferrer"&gt;https://chrisdown.name/talks/cgroupv2/cgroupv2-fosdem.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:17" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:18"&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/muahao/p/10109712.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/muahao/p/10109712.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:18" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>PostgreSQL CLOG Files and Standby Synchronization Analysis</title><link>https://lastdba.com/en/2024/09/03/postgresql-clog-files-and-standby-synchronization-analysis/</link><pubDate>Tue, 03 Sep 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/09/03/postgresql-clog-files-and-standby-synchronization-analysis/</guid><description>&lt;p&gt;Among all relational databases, PostgreSQL&amp;rsquo;s CLOG is a very special type of log. CLOG&amp;rsquo;s existence is inseparable from PostgreSQL&amp;rsquo;s MVCC mechanism. Some basic knowledge about transaction IDs and CLOG won&amp;rsquo;t be covered in this article. If interested, please refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172343394916800211586382%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172343394916800211586382&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-130782857-null-null.nonecase&amp;amp;utm_term=clog&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;CLOG and Hint Bits&lt;/a&gt;. This article focuses on the structure of CLOG files, manually locating transaction states, and the CLOG WAL log synchronization mechanism, to further understand PostgreSQL&amp;rsquo;s CLOG.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG Segment
 &lt;div id="clog-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;CLOG Directory
 &lt;div id="clog-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To distinguish from regular logs, PostgreSQL 10 renamed the CLOG and WAL directories &lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Among all relational databases, PostgreSQL&amp;rsquo;s CLOG is a very special type of log. CLOG&amp;rsquo;s existence is inseparable from PostgreSQL&amp;rsquo;s MVCC mechanism. Some basic knowledge about transaction IDs and CLOG won&amp;rsquo;t be covered in this article. If interested, please refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172343394916800211586382%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172343394916800211586382&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-130782857-null-null.nonecase&amp;amp;utm_term=clog&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;CLOG and Hint Bits&lt;/a&gt;. This article focuses on the structure of CLOG files, manually locating transaction states, and the CLOG WAL log synchronization mechanism, to further understand PostgreSQL&amp;rsquo;s CLOG.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG Segment
 &lt;div id="clog-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;CLOG Directory
 &lt;div id="clog-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To distinguish from regular logs, PostgreSQL 10 renamed the CLOG and WAL directories &lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;pg9.6&lt;/th&gt;
 &lt;th&gt;pg10&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_clog&lt;/td&gt;
 &lt;td&gt;pg_xact&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_xlog&lt;/td&gt;
 &lt;td&gt;pg_wal&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Don&amp;rsquo;t get confused — I was also troubled by pg_xlog and pg_xact for a while&amp;hellip;&lt;/p&gt;

&lt;h3 class="relative group"&gt;CLOG Segment Name
 &lt;div id="clog-segment-name" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment-name" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CLOG is also managed by SLRU, and CLOG file naming is also in &lt;code&gt;slru.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SlruFileName(ctl, path, seg) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	snprintf(path, MAXPGPATH, &amp;#34;%s/%04X&amp;#34;, (ctl)-&amp;gt;Dir, seg)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;%04X&lt;/code&gt; means hexadecimal (&lt;code&gt;X&lt;/code&gt;), width of 4, zero-padded on the left (&lt;code&gt;04&lt;/code&gt;).
Example CLOG filenames:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 16:29 03C0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 23:04 03C1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;TransactionID and CLOG Location Conversion
 &lt;div id="transactionid-and-clog-location-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transactionid-and-clog-location-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;CLOG only stores transaction ID status, not the transaction ID itself. Through the TransactionID itself, you can directly locate the CLOG file and the position within the file. Before that, we need to understand some fundamentals.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction States Stored in CLOG
 &lt;div id="transaction-states-stored-in-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-states-stored-in-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are only 4 transaction states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; XidStatus;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_IN_PROGRESS		0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_COMMITTED		0x01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_ABORTED		0x02
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_SUB_COMMITTED	0x03&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction states are only: in progress, committed, aborted, subtransaction committed. Note that transaction IDs don&amp;rsquo;t have an &amp;ldquo;not started&amp;rdquo; state — as soon as a transaction ID is allocated in the database, that transaction has definitely already started.
Conversely, transaction IDs not yet allocated in the database (actually a few — see the extend CLOG section below) correspond to &lt;code&gt;in_progress&lt;/code&gt; status in CLOG.
Four transaction states actually only need 2 bits to store. So 1 byte (8 bits) can store 4 transaction states, and 1 page (8k) can hold 8KB*4=32768 transaction states. These are all defined in the source code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Defines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; CLOG page sizes. A page is the same BLCKSZ as is used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; everywhere &lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; in Postgres.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// CLOG page size = BLCKSZ = 8k (default)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_BITS_PER_XACT	2 							 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// One transaction state occupies 2 bits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_BYTE 4 							 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 byte can hold 4 transaction states
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 page can hold 32768 transaction states, 8KB*4=32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACT_BITMASK ((1 &amp;lt;&amp;lt; CLOG_BITS_PER_XACT) - 1) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Transaction status bitmask = ((1&amp;lt;&amp;lt;2)-1) = 3, expressed in binary as 11
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SLRU_PAGES_PER_SEGMENT	32 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 segment has 32 pages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 CLOG segment has 32 pages&lt;/li&gt;
&lt;li&gt;1 CLOG page is 8k (typically)&lt;/li&gt;
&lt;li&gt;1 byte has 4 transaction states&lt;/li&gt;
&lt;li&gt;1 transaction state occupies 2 bits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;CLOG Segment/Page/Byte Conversion
 &lt;div id="clog-segmentpagebyte-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segmentpagebyte-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Finding which CLOG segment a transaction ID corresponds to is not easy — it&amp;rsquo;s hidden in the comments:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Note: because TransactionIds are &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; bits and wrap around at &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CLOG page numbering also wraps around at &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;CLOG_XACTS_PER_PAGE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; and CLOG segment numbering at
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;CLOG_XACTS_PER_PAGE&lt;span style="color:#f92672"&gt;/&lt;/span&gt;SLRU_PAGES_PER_SEGMENT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// segment number = xid/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT = xid/32768/32 // Which CLOG segment the transaction ID corresponds to, xid/32768/32, needs to be converted to hex
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Mapping transaction ID to page, byte, etc. is clearer &lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which CLOG page the transaction ID corresponds to, xid/32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// The offset within the above page, xid%32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToByte(xid)	(TransactionIdToPgIndex(xid) / CLOG_XACTS_PER_BYTE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which byte in the page the transaction ID corresponds to, (xid%32768)/4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToBIndex(xid)	((xid) % (TransactionId) CLOG_XACTS_PER_BYTE)		&lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which bit index in the above byte (note: bit index, not the bit itself), xid%4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generally (with 8k BLCKSZ), 1 CLOG segment has 32 pages; 1 CLOG segment has 32&lt;em&gt;8k bytes, &lt;strong&gt;i.e., CLOG file size is fixed at 256K&lt;/strong&gt;; 1 CLOG segment can hold 4&lt;/em&gt;32*8k transaction states.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll &lt;span style="color:#75715e"&gt;# 256k CLOG segment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 16:29 03C0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 23:04 03C1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;CLOG Bit Conversion
 &lt;div id="clog-bit-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-bit-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The functions for setting CLOG bits and getting CLOG bits (corresponding to &lt;code&gt;TransactionIdSetStatusBit&lt;/code&gt; and &lt;code&gt;TransactionIdGetStatus&lt;/code&gt;) both have the following code to obtain which two bits in the CLOG the transaction ID corresponds to:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			bshift &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToBIndex&lt;/span&gt;(xid) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CLOG_BITS_PER_XACT;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;byteptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	byteptr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;page_buffer[slotno] &lt;span style="color:#f92672"&gt;+&lt;/span&gt; byteno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	curval &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;byteptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; bshift) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CLOG_XACT_BITMASK;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;bshift&lt;/code&gt; represents the right-shift position, where &lt;code&gt;TransactionIdToBIndex=xid%4&lt;/code&gt;, &lt;code&gt;CLOG_BITS_PER_XACT=2&lt;/code&gt;, &lt;code&gt;CLOG_XACT_BITMASK=3 (binary: 11)&lt;/code&gt;.
The key code for getting CLOG bits &lt;code&gt;curval = (*byteptr &amp;gt;&amp;gt; bshift) &amp;amp; CLOG_XACT_BITMASK&lt;/code&gt; can be understood in two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;*byteptr &amp;gt;&amp;gt; bshift&lt;/code&gt; means right-shifting the pointer by 0, 2, 4, or 6 bits&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;amp; CLOG_XACT_BITMASK&lt;/code&gt; is simply taking the last two bits after the right shift (00&amp;amp;11=00, 01&amp;amp;11=01, 10&amp;amp;11=10, 11&amp;amp;11=11)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, calculating the position of a transaction ID&amp;rsquo;s state within a byte:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;xid%4=0: takes bits 7 and 8&lt;/li&gt;
&lt;li&gt;xid%4=1: takes bits 5 and 6&lt;/li&gt;
&lt;li&gt;xid%4=2: takes bits 3 and 4&lt;/li&gt;
&lt;li&gt;xid%4=3: takes bits 1 and 2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: the transaction ID state&amp;rsquo;s bit positions within a byte are taken in reverse order, not sequentially forward. Byte and page positions are taken in sequential increasing order.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Manually Calculating Transaction ID Position in CLOG File
 &lt;div id="manually-calculating-transaction-id-position-in-clog-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manually-calculating-transaction-id-position-in-clog-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If we want to manually locate a transaction in CLOG using &lt;code&gt;hexdump&lt;/code&gt;, we need to calculate three elements: &lt;strong&gt;&amp;lt;CLOG segment number, offset within segment in bytes, offset on byte in bit index&amp;gt;&lt;/strong&gt;. (This references the approach in &amp;ldquo;PostgreSQL Database Kernel Analysis&amp;rdquo; but with some differences &lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;)&lt;/p&gt;
&lt;p&gt;Before calculating, you also need to understand:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CLOG segment file numbers are in hexadecimal&lt;/li&gt;
&lt;li&gt;hexdump is in hexadecimal, each line holds 16 bytes, i.e., each line holds &lt;code&gt;16*CLOG_XACTS_PER_BYTE=16*4=64&lt;/code&gt; transaction states&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hexdump -s xxx&lt;/code&gt; is in byte units&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following SQL can calculate the position of a transaction ID in CLOG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- CLOG segment number
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, /(8192*4*32) represents the maximum number of transactions a segment file can contain, to_hex converts to hex for filename, lpad left-pads to 4 digits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Offset within segment in bytes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, %(8192*32*4) takes the remaining transaction IDs, /4 converts to byte units
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Offset on byte in bit index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, %4 takes the bit index within the byte
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Or a single SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Practical simulation — computing a transaction ID&amp;rsquo;s state in CLOG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clog_segmentno &lt;span style="color:#f92672"&gt;|&lt;/span&gt; in_clog_offset_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; in_byte_offset_bitindex 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------+----------------------+-------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0002&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63196&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Rollback is used to roll back the transaction, mainly for easier observation, since most transactions are committed.
Checkpoint is to ensure the CLOG page is flushed — otherwise the CLOG page might still be in the CLOG buffer and not yet written to the CLOG segment file.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pg_xact&lt;span style="color:#f92672"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hexdump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0002&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;s &lt;span style="color:#ae81ff"&gt;63196&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;n &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;f6dc &lt;span style="color:#ae81ff"&gt;95&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;f6dd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Convert hex to binary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;x96&amp;#39;&lt;/span&gt;::bit(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bit 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;10010110&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When xid%4=3, take bits 1 and 2. So the bit value for this rolled-back transaction is 10, where 10 represents &lt;code&gt;TRANSACTION_STATUS_ABORTED&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why CLOG Usually Contains Many 55s and U&amp;rsquo;s?
 &lt;div id="why-clog-usually-contains-many-55s-and-us" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-clog-usually-contains-many-55s-and-us" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In a typical transactional database CLOG file, a direct hexdump looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hexdump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;C &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;v&lt;span style="color:#f92672"&gt;|&lt;/span&gt;head &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000020&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000030&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000040&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000050&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000060&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because the committed transaction state = 01 = &lt;code&gt;TRANSACTION_STATUS_COMMITTED&lt;/code&gt;. When 4 consecutive transactions in a byte are all committed, it becomes 01010101.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Binary: 01010101, hex: 55&lt;/li&gt;
&lt;li&gt;Hex 55 in ASCII is &amp;lsquo;U&amp;rsquo;, so when visually examining CLOG files you can generally see many U&amp;rsquo;s&lt;/li&gt;
&lt;li&gt;Occasionally some bytes are not 55 or U because in production environments some transactions occasionally haven&amp;rsquo;t completed or use subtransactions. The committed state of subtransactions in CLOG is 0x03.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Shared CLOG Buffer
 &lt;div id="shared-clog-buffer" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-clog-buffer" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The number of CLOG shared buffers is easy to understand:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Number of shared CLOG buffers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On larger multi-processor systems, it is possible to have many CLOG page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * requests in flight at one time which could lead to disk access for CLOG
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * page if the required page is not found in memory. Testing revealed that we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * can get the best performance by having 128 CLOG buffers, more than that it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * doesn&amp;#39;t improve performance.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * a good idea, because it would increase the minimum amount of shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * required to start, which could be a problem for people running very small
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * configurations. The following formula seems to represent a reasonable
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * compromise: people with very low values for shared_buffers will get fewer
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CLOG buffers as well, and everyone else will get 128.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CLOGShmemBuffers&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;Min&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;Max&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, NBuffers &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;512&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Translation: Testing has shown that 128 CLOG buffers provide the best performance — more than that doesn&amp;rsquo;t improve performance. However, because some database configurations are too small, 128 CLOG buffers seems a bit large, so it takes 1/512 of the shared_buffers count. In other words:
Number of CLOG buffers = 1/512 shared_buffer, minimum is 4, maximum is 128. Note: these are all buffer counts, not sizes!&lt;/p&gt;
&lt;p&gt;How large is a single buffer?
CLOG buffer is managed by SLRU, and each SLRU page is 8k:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;A page is the same BLCKSZ as is used everywhere&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;We can glimpse the size of shared CLOG buffer from the perspective of CLOG SLRU initialization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Initialization of shared memory for CLOG
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CLOGShmemSize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SimpleLruShmemSize&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;CLOGShmemBuffers&lt;/span&gt;(), CLOG_LSNS_PER_PAGE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The passed &lt;code&gt;CLOGShmemBuffers()&lt;/code&gt; is 4~128, and the passed &lt;code&gt;CLOG_LSNS_PER_PAGE&lt;/code&gt; = 1024 bytes (with 8k pages).
&lt;code&gt;SimpleLruShmemSize&lt;/code&gt; initializes SLRU shared memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SimpleLruShmemSize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nslots, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nlsns)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		sz;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* we assume nslots isn&amp;#39;t so large as to risk overflow */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(SlruSharedData));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_buffer[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(SlruPageStatus));	&lt;span style="color:#75715e"&gt;/* page_status[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_dirty[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_number[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_lru_count[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(LWLockPadded));	&lt;span style="color:#75715e"&gt;/* buffer_locks[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (nlsns &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; nlsns &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(XLogRecPtr));	&lt;span style="color:#75715e"&gt;/* group_lsn[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;BUFFERALIGN&lt;/span&gt;(sz) &lt;span style="color:#f92672"&gt;+&lt;/span&gt; BLCKSZ &lt;span style="color:#f92672"&gt;*&lt;/span&gt; nslots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;SLRU uses some arrays to store SLRU metadata and control information. The sz size is all roughly &lt;code&gt;data type * buffer count&lt;/code&gt;, and these are generally not very large. The main initialized memory is &lt;code&gt;BLCKSZ * nslots&lt;/code&gt;, i.e., &lt;code&gt;8k * (4~128) = (32k~1M)&lt;/code&gt;. So we can &lt;em&gt;roughly&lt;/em&gt; estimate that the shared CLOG buffer size is around 1M.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG WAL: Types, Writing, and Redo
 &lt;div id="clog-wal-types-writing-and-redo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-wal-types-writing-and-redo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When writing CLOG, is CLOG WAL log also written? If so, wouldn&amp;rsquo;t that mean lost CLOG could be restored by reapplying WAL logs to recover transaction states? Let&amp;rsquo;s explore the CLOG WAL writing and redo source code with these questions in mind.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Extend CLOG
 &lt;div id="extend-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extend-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ZeroCLOGPage&lt;/code&gt; writes WAL. &lt;code&gt;ZeroCLOGPage(pageno, true)&lt;/code&gt; is actually &lt;em&gt;only&lt;/em&gt; called by &lt;code&gt;ExtendCLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Make sure that CLOG has room for a newly-allocated XID.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NB: this is called while holding XidGenLock. We want it to be very fast
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * most of the time; even when it&amp;#39;s not so fast, no actual I/O need happen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * unless we&amp;#39;re forced to write out a dirty clog or xlog page to make room
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in shared memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExtendCLOG&lt;/span&gt;(TransactionId newestXact)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * No work except at first XID of a page. But beware: just after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdToPgIndex&lt;/span&gt;(newestXact) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdEquals&lt;/span&gt;(newestXact, FirstNormalTransactionId))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pageno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToPage&lt;/span&gt;(newestXact); &lt;span style="color:#75715e"&gt;// CLOG page number converted from TransactionId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(XactSLRULock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Zero the page and make an XLOG entry about it */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ZeroCLOGPage&lt;/span&gt;(pageno, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(XactSLRULock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ZeroCLOGPage&lt;/code&gt; mainly calls &lt;code&gt;WriteZeroPageXlogRec&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Write a ZEROPAGE xlog record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WriteZeroPageXlogRec&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; pageno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogBeginInsert&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogRegisterData&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) (&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;pageno), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;XLogInsert&lt;/span&gt;(RM_CLOG_ID, CLOG_ZEROPAGE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;WriteZeroPageXlogRec&lt;/code&gt; is writing a WAL record, with type &amp;ldquo;RM_CLOG_ID, CLOG_ZEROPAGE&amp;rdquo;.
Using waldump, you can view CLOG_ZEROPAGE. Its proportion is generally very small:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;z &lt;span style="color:#ae81ff"&gt;000000010000056&lt;/span&gt;B00000018 &lt;span style="color:#75715e"&gt;--stat=record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; N (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) Record &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) FPI &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) Combined &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---- - --- ----------- --- -------- --- ------------- ---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CLOG&lt;span style="color:#f92672"&gt;/&lt;/span&gt;ZEROPAGE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Extending CLOG page is always in page units. In fact, at the end of a CLOG segment you can easily see 00s:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hexdump 03C2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000 5555 5555 5555 5555 5555 5555 5555 5555
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001bb30 5555 5555 0055 0000 0000 0000 0000 0000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001bb40 0000 0000 0000 0000 0000 0000 0000 0000 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;* ## The end of the CLOG file is all zeros
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001c000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Truncate CLOG
 &lt;div id="truncate-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#truncate-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides extending CLOG, there&amp;rsquo;s also truncating CLOG. Truncate CLOG is called during vacuum. When called, it writes a truncate CLOG WAL record and flushes the WAL record to disk:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Remove all CLOG segments before the one holding the passed transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Before removing any CLOG data, we must flush XLOG to disk, to ensure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * that any recently-emitted FREEZE_PAGE records have reached disk; otherwise
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * a crash and restart might leave us with some unfrozen tuples referencing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * removed CLOG data. We choose to emit a special TRUNCATE XLOG record too.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Replaying the deletion from XLOG is not critical, since the files could
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * just as well be removed later, but doing so prevents a long-running hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * standby server from acquiring an unreasonably bloated CLOG directory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Since CLOG segments hold a large number of transactions, the opportunity to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * actually remove a segment is fairly rare, and so it seems best not to do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * the XLOG flush unless we have confirmed that there is a removable segment.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;TruncateCLOG&lt;/span&gt;(TransactionId oldestXact, Oid oldestxid_datoid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cutoffPage;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * The cutoff point is the start of the segment containing oldestXact. We
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * pass the *page* containing oldestXact to SimpleLruTruncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// What&amp;#39;s written to WAL is the CLOG position, which is the CLOG page number converted from oldestXact
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cutoffPage &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToPage&lt;/span&gt;(oldestXact); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Write XLOG record and flush XLOG to disk. We record the oldest xid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * we&amp;#39;re keeping information about here so we can ensure that it&amp;#39;s always
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ahead of clog truncation in case we crash, and so a standby finds out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the new valid xid before the next checkpoint.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// WriteTruncateXlogRec writes the corresponding WAL record and flushes it to disk
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WriteTruncateXlogRec&lt;/span&gt;(cutoffPage, oldestXact, oldestxid_datoid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// After WAL is written, actually execute the CLOG segment truncation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Now we can remove the old CLOG segment(s) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;SimpleLruTruncate&lt;/span&gt;(XactCtl, cutoffPage);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;WriteTruncateXlogRec&lt;/code&gt; writes a WAL record with &lt;code&gt;RMGR&lt;/code&gt; as &lt;code&gt;RM_CLOG_ID&lt;/code&gt; and &lt;code&gt;info&lt;/code&gt; as &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Write a TRUNCATE xlog record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We must flush the xlog record to disk before returning --- see notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in TruncateCLOG().
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WriteTruncateXlogRec&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; pageno, TransactionId oldestXact, Oid oldestXactDb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	XLogRecPtr	recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xl_clog_truncate xlrec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.pageno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.oldestXact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; oldestXact;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.oldestXactDb &lt;span style="color:#f92672"&gt;=&lt;/span&gt; oldestXactDb;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogBeginInsert&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogRegisterData&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) (&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xlrec), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(xl_clog_truncate));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	recptr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogInsert&lt;/span&gt;(RM_CLOG_ID, CLOG_TRUNCATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogFlush&lt;/span&gt;(recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After generating CLOG WAL records, the redo recovery routine is also needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CLOG resource manager&amp;#39;s routines
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;clog_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When redo info type is CLOG_ZEROPAGE, place the read redo information in memory, then write to the CLOG page file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CLOG_ZEROPAGE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			slotno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;pageno, &lt;span style="color:#a6e22e"&gt;XLogRecGetData&lt;/span&gt;(record), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(XactSLRULock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		slotno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ZeroCLOGPage&lt;/span&gt;(pageno, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SimpleLruWritePage&lt;/span&gt;(XactCtl, slotno); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;page_dirty[slotno]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(XactSLRULock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When redo info type is CLOG_TRUNCATE, place the read redo information in memory, confirm the page is deletable (write page if not), then truncate the segment
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CLOG_TRUNCATE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		xl_clog_truncate xlrec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xlrec, &lt;span style="color:#a6e22e"&gt;XLogRecGetData&lt;/span&gt;(record), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(xl_clog_truncate));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * During XLOG replay, latest_page_number isn&amp;#39;t set up yet; insert a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * suitable value to bypass the sanity test in SimpleLruTruncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;latest_page_number &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xlrec.pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AdvanceOldestClogXid&lt;/span&gt;(xlrec.oldestXact);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SimpleLruTruncate&lt;/span&gt;(XactCtl, xlrec.pageno);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;clog_redo: unknown op code %u&amp;#34;&lt;/span&gt;, info);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What the CLOG redo routine does:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When redo info type is &lt;code&gt;CLOG_ZEROPAGE&lt;/code&gt;: finds a suitable slot (evict if necessary), performs writability checks based on the read redo information (actually the CLOG page number), then writes the page to the CLOG file&lt;/li&gt;
&lt;li&gt;When redo info type is &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;: based on the read redo information (actually the CLOG page number), confirms the page is deletable (write page if not available), then truncates the CLOG segment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;CLOG Synchronization Summary
 &lt;div id="clog-synchronization-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-synchronization-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CLOG has only two types of WAL logs, neither containing transaction status information. They are only triggered when extending CLOG pages and truncating CLOG segments, and the written WAL record is just a CLOG page number.
CLOG&amp;rsquo;s WAL log RMGR type has only one: &lt;code&gt;RM_CLOG_ID&lt;/code&gt;. This type has only two info codes: &lt;code&gt;CLOG_ZEROPAGE&lt;/code&gt;, &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* XLOG stuff */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_ZEROPAGE 0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_TRUNCATE 0x10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CLOG WAL synchronization summary:
&lt;strong&gt;The standby database is essentially not synchronizing CLOG information — it&amp;rsquo;s only synchronizing some CLOG file expansion and deletion information.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;However, the standby&amp;rsquo;s CLOG file clearly does have status information, and the standby obviously needs this information for visibility checking. How is the transaction status in CLOG synchronized?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction ID WAL: Types, Writing, and Redo
 &lt;div id="transaction-id-wal-types-writing-and-redo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-wal-types-writing-and-redo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The WAL for rmgr=CLOG doesn&amp;rsquo;t contain transaction status. Does the standby not synchronize CLOG transaction information? No — WAL logs do contain transaction ID status information, and CLOG is also updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Roll back a transaction, commit a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; txid_current 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1817254&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; txid_current 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1817258&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECKPOINT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_waldump to view transaction ID status in logs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[datalzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_wal]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_waldump ..&lt;span style="color:#f92672"&gt;/&lt;/span&gt;..&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_wal&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;000000010000007300000008&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#f92672"&gt;-&lt;/span&gt;E &lt;span style="color:#e6db74"&gt;&amp;#34;1817254|1817258&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;1817254&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;ED210, prev &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;ED1E0, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;ABORT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;017612&lt;/span&gt; CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;1817258&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;EEB08, prev &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;EEAD8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042545&lt;/span&gt; CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; WAL record &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;F7C78: invalid record &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;F7F88: wanted &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;, got &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The WAL records the status of transaction IDs (1817254, 1817258), recorded as &lt;code&gt;ABORT&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt; respectively; rmgr is &lt;code&gt;Transaction&lt;/code&gt;.
Transaction ID status is in WAL logs, but does PostgreSQL write it to the standby&amp;rsquo;s CLOG?
Obviously, we need to find this redo information. Based on previous experience, &lt;code&gt;clog_redo&lt;/code&gt; represents the WAL redo source code for rmgr=CLOG. Searching the source for &lt;code&gt;_redo&lt;/code&gt; should find the WAL redo source code for rmgr=Transaction. Searching&amp;hellip; in &lt;code&gt;xact.c&lt;/code&gt; we find the function &lt;code&gt;xact_redo&lt;/code&gt;, which mainly calls &lt;code&gt;xact_redo_commit&lt;/code&gt; and &lt;code&gt;xact_redo_abort&lt;/code&gt;, clearly corresponding to WAL log application logic for committed and rolled-back transactions respectively.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;xact_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		info &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogRecGetInfo&lt;/span&gt;(record) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; XLOG_XACT_OPMASK;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Backup blocks are not used in xact records */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;XLogRecHasAnyBlockRefs&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; XLOG_XACT_COMMIT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;xact_redo_commit&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;parsed, &lt;span style="color:#a6e22e"&gt;XLogRecGetXid&lt;/span&gt;(record),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 record&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;EndRecPtr, &lt;span style="color:#a6e22e"&gt;XLogRecGetOrigin&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; XLOG_XACT_ABORT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;xact_redo_abort&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;parsed, &lt;span style="color:#a6e22e"&gt;XLogRecGetXid&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;xact_redo: unknown op code %u&amp;#34;&lt;/span&gt;, info);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Taking commit as an example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;xact_redo_commit&lt;/span&gt;(xl_xact_parsed_commit &lt;span style="color:#f92672"&gt;*&lt;/span&gt;parsed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 XLogRecPtr lsn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RepOriginId origin_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (standbyState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; STANDBY_DISABLED)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Mark the transaction committed in pg_xact.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;TransactionIdCommitTree&lt;/span&gt;(xid, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nsubxacts, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxacts);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// standby logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Mark the transaction committed in pg_xact. We use async commit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * protocol during recovery to provide information on database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * consistency for when users try to set hint bits. It is important
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * that we do not set hint bits until the minRecoveryPoint is past
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * this commit record. This ensures that if we crash we don&amp;#39;t see hint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * bits set on changes made by transactions that haven&amp;#39;t yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * recovered. It&amp;#39;s unlikely but it&amp;#39;s good to be safe.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Mark transaction committed in pg_xact
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;TransactionIdAsyncCommitTree&lt;/span&gt;(xid, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nsubxacts, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxacts, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It looks like &lt;code&gt;TransactionIdAsyncCommitTree&lt;/code&gt; is the function we&amp;rsquo;re looking for that writes to CLOG.&lt;/p&gt;
&lt;p&gt;To verify the redo logic for transaction commit information in WAL, let&amp;rsquo;s set three breakpoints on the standby&amp;rsquo;s startup process, then execute &lt;code&gt;begin;select txid_current();commit;&lt;/code&gt; on the source database to commit a transaction, and see if the standby&amp;rsquo;s startup process hits the functions we want to see when doing redo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(gdb) bt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 TransactionIdAsyncCommitTree (xid=xid@entry=1818665, nxids=0, xids=0x0, lsn=lsn@entry=495398394064) at transam.c:274
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x000000000050c139 in xact_redo_commit (parsed=parsed@entry=0x7ffda52c0fc0, xid=1818665, lsn=495398394064, origin_id=&amp;lt;optimized out&amp;gt;) at xact.c:5805
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x000000000050ffa3 in xact_redo (record=0x2b5ff2434038) at xact.c:5962
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x0000000000519ea5 in StartupXLOG () at xlog.c:7411
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000072f301 in StartupProcessMain () at startup.c:204
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x0000000000528701 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffda52c6ef0) at bootstrap.c:450
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x000000000072c459 in StartChildProcess (type=StartupProcess) at postmaster.c:5494
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x000000000072ec44 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2b5ff242d1c0) at postmaster.c:1407
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x000000000048931f in main (argc=3, argv=0x2b5ff242d1c0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(gdb) info b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Num Type Disp Enb Address What
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x000000000050c060&lt;/span&gt; in xact_redo_commit at xact.c:&lt;span style="color:#ae81ff"&gt;5753&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint already hit &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; times
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x0000000000508190&lt;/span&gt; in TransactionIdCommitTree at transam.c:&lt;span style="color:#ae81ff"&gt;262&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x00000000005081a0&lt;/span&gt; in TransactionIdAsyncCommitTree at transam.c:&lt;span style="color:#ae81ff"&gt;274&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint already hit &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; time&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The breakpoint &lt;code&gt;TransactionIdAsyncCommitTree&lt;/code&gt; is hit, and &lt;code&gt;xid=1818665&lt;/code&gt;, which is the transaction ID just committed on the source database. This confirms the code logic we visually traced is correct.
So, &lt;strong&gt;the standby database&amp;rsquo;s CLOG transaction ID status is synchronized by WAL with rmgr=Transaction.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CLOG only stores transaction ID status, not the transaction ID itself&lt;/li&gt;
&lt;li&gt;Transaction status in CLOG files can be manually located via the transaction ID&lt;/li&gt;
&lt;li&gt;WAL for rmgr=CLOG only extends and cleans up CLOG files, it does not update transaction status&lt;/li&gt;
&lt;li&gt;WAL for rmgr=Transaction updates CLOG transaction status&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&amp;ldquo;Quickly Mastering PostgreSQL Version New Features&amp;rdquo;, p24&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Yan Shuli, PostgreSQL CLOG Analysis &lt;a href="https://www.modb.pro/db/606433" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/606433&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&amp;ldquo;PostgreSQL Database Kernel Analysis&amp;rdquo;, Chapter 7, p380-390&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>PostgreSQL Logical Replication</title><link>https://lastdba.com/en/2024/08/13/postgresql-logical-replication/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/postgresql-logical-replication/</guid><description>&lt;h3 class="relative group"&gt;What is Logical Replication
 &lt;div id="what-is-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL logical replication is based on logical decoding, which parses WAL log streams into a specified format for output. The subscriber node receives the parsed data and applies it.&lt;/p&gt;
&lt;p&gt;Logical replication differs from streaming replication (physical replication) which is based on instance-level primary-standby where the physical structures are identical. Logical replication can selectively replicate at the table level. Logical Replication in official documentation specifically refers to the &amp;ldquo;publish-subscribe&amp;rdquo; model. In fact, many tools can use logical decoding for heterogeneous database data synchronization.&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;What is Logical Replication
 &lt;div id="what-is-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL logical replication is based on logical decoding, which parses WAL log streams into a specified format for output. The subscriber node receives the parsed data and applies it.&lt;/p&gt;
&lt;p&gt;Logical replication differs from streaming replication (physical replication) which is based on instance-level primary-standby where the physical structures are identical. Logical replication can selectively replicate at the table level. Logical Replication in official documentation specifically refers to the &amp;ldquo;publish-subscribe&amp;rdquo; model. In fact, many tools can use logical decoding for heterogeneous database data synchronization.&lt;/p&gt;
&lt;p&gt;pg9.4&amp;rsquo;s pglogical plugin can support logical replication (&lt;a href="https://github.com/2ndQuadrant/pglogical" target="_blank" rel="noreferrer"&gt;https://github.com/2ndQuadrant/pglogical&lt;/a&gt;), and pg10 onwards natively supports logical replication.&lt;/p&gt;
&lt;p&gt;Logical replication can be used for database upgrades, heterogeneous data migration, table-level data synchronization links, subscribing to data streams, etc.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Logical Decoding
 &lt;div id="logical-decoding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Logical decoding can parse table data changes in WAL logs into row data streams or SQL text. These row data streams or SQL text can be consumed by other types of databases or software. The specific parsing format is determined by the output plugin.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Replication Slots
 &lt;div id="replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In logical replication, a replication slot represents a data change stream. Like physical replication slots, logical replication slots also ensure that after an abnormal replication interruption, the related WAL logs are not deleted, so that WAL log parsing can continue after replication reconnects. A database can have multiple replication slots. Each replication slot has only one output plugin, and each replication slot represents one replication link. Replication slots are essentially used to manage replication links. Unlike streaming replication which can function without replication slots, logical replication must have replication slots.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Output Plugin
 &lt;div id="output-plugin" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#output-plugin" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The output plugin converts WAL log information into the format required by the replication slot. PostgreSQL has some built-in output plugins and additional ones can be added through plugins. Each logical replication slot has an output plugin for WAL-related parsing work.&lt;/p&gt;
&lt;p&gt;Output plugins use callback functions to manage parsing. For example, OUTPUT_PLUGIN_BINARY_OUTPUT and OUTPUT_PLUGIN_TEXTUAL_OUTPUT are used to set whether the out_type is binary or text. There are also callback functions to notify the plugin of transaction data changes and sort transactions. Callback functions of course don&amp;rsquo;t need to be used manually; some built-in output plugins are already packaged.&lt;/p&gt;
&lt;p&gt;Each output plugin has some different parsing behaviors and output formats.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Several Common Output Plugins
 &lt;div id="several-common-output-plugins" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#several-common-output-plugins" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;test_decoding: This is a sample output plugin, essentially the raw form of an output plugin. Official documentation says it&amp;rsquo;s a template, but it can still parse. This output plugin comes with PostgreSQL but needs to be compiled in contrib.&lt;/p&gt;
&lt;p&gt;pgoutput: The default output plugin for the publish-subscribe model. In publish-subscribe, the walsender process uses this output plugin to logically decode WAL logs.&lt;/p&gt;
&lt;p&gt;decoder_raw: Parses into SQL text format. This is not included with PostgreSQL; compile it yourself: &lt;a href="https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw" target="_blank" rel="noreferrer"&gt;https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;wal2json: This output plugin converts WAL log information into JSON format.&lt;/p&gt;
&lt;p&gt;Other output plugins can be referenced at: &lt;a href="https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Some domestic vendors have also made their own output plugins.&lt;/p&gt;
&lt;p&gt;Relationship between several output plugins and logical replication plugins:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8681978ee447.png" alt="5bc6c1dacf2c4f4888f2e299d3d75bc6.png" /&gt;


&lt;img src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;pgoutput, test_decoding, and wal2json have been introduced above.&lt;/p&gt;
&lt;p&gt;pglogical was the predecessor of pglogical replication in pg9.4.&lt;/p&gt;
&lt;p&gt;BDR was developed by 2ndQuadrant, supporting bidirectional replication and DDL synchronization with more powerful features. BDR 3.0 onwards became closed-source.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Functions and Tools for Manually Receiving Parsed Data
 &lt;div id="functions-and-tools-for-manually-receiving-parsed-data" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#functions-and-tools-for-manually-receiving-parsed-data" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pg_logical_slot_get_changes(): Displays parsed data and consumes it.&lt;/p&gt;
&lt;p&gt;pg_logical_slot_peek_changes(): Displays parsed data without consuming it.&lt;/p&gt;
&lt;p&gt;pg_recvlogical: A tool included with PostgreSQL that can consume data within a replication slot, equivalent to the downstream of logical replication. The corresponding physical WAL receiving tool is pg_receivewal.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Logical Decoding Test 1&lt;/strong&gt;: Observing data parsing with 2 different output plugins
 &lt;div id="logical-decoding-test-1-observing-data-parsing-with-2-different-output-plugins" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-test-1-observing-data-parsing-with-2-different-output-plugins" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create two logical replication slots using logical_test and logical_raw respectively
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_test,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F50)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;decoder_raw&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_raw,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F88)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Only the upstream is created, slot is in f state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; safe_wal_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1766878&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17668&lt;/span&gt;B0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_raw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; decoder_raw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;557&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F50 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F88 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tdecoder222(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attempt to get this DDL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_logical_slot_get_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;include-xids&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#66d9ef"&gt;option&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;include-xids&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;0&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unknown&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CONTEXT: slot &lt;span style="color:#e6db74"&gt;&amp;#34;logical_raw&amp;#34;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;output&lt;/span&gt; plugin &lt;span style="color:#e6db74"&gt;&amp;#34;decoder_raw&amp;#34;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the startup callback
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_logical_slot_get_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;include-xids&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17669&lt;/span&gt;C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776778&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- We can see that decoder_raw didn&amp;#39;t parse the DDL at all, and logical_test only got the DDL transaction without the DDL statement itself, essentially not parsing the DDL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert a row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+---------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776900&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- test_decoding parsed the transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- decoder_raw parsed the transaction into SQL statements&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This test allows us to conclude:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Replication slots in f state still parse, waiting for downstream consumption&lt;/li&gt;
&lt;li&gt;Each output plugin has some different parsing behaviors and output formats&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Logical Decoding Test 2: Using pg_recvlogical to receive logically decoded data, simulating a logical replication link
 &lt;div id="logical-decoding-test-2-using-pg_recvlogical-to-receive-logically-decoded-data-simulating-a-logical-replication-link" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-test-2-using-pg_recvlogical-to-receive-logically-decoded-data-simulating-a-logical-replication-link" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Configure passwordless login
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ vi .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzl:5410:lzldb:pg:pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ chmod &lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Start pg_recvlogical
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_recvlogical -h lzl -p &lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; -d lzldb -U pg --slot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical_raw --start -f recv.sql &amp;amp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ps -ef|grep recv|grep -v grep
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;7747&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7355&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 21:40 pts/3 00:00:00 pg_recvlogical -h lzl -p &lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; -d lzldb -U pg --slot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical_raw --start -f recv.sql&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;qwe&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;asd&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;f recv.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;qwe&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- update was not correctly parsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add a primary key to the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl2&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlupdate&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;f recv.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl2&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;, b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzlupdate&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;ndash; After adding a primary key, update was correctly parsed by decoder_raw
&amp;ndash; Without a primary key, it won&amp;rsquo;t be correctly parsed. This is related to replica identity, which will be introduced later.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Prerequisites for Logical Replication
 &lt;div id="prerequisites-for-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prerequisites-for-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;1. Parameters
 &lt;div id="1-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;1.1 Basic Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;wal_level. Takes effect after restart, default is replica. The wal_level parameter must be logical. logical does not change WAL to logical; it means that on top of supporting physical replication (replica), the necessary information for logical decoding is added. Since pg9.6, there are only minimal, replica, and logical, with information content increasing successively.&lt;/li&gt;
&lt;li&gt;max_replication_slots. Takes effect after restart, default value below pg9.6 is 0, pg10 and above is 10. 10 is generally sufficient. Like physical replication, logical replication generally also uses replication slots. PostgreSQL backups and physical replication can both occupy replication slot counts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;1.2 Source-side Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max_wal_senders. Takes effect after restart, default 10. Sender process count limit. The publisher&amp;rsquo;s sender transmits the parsed logs. Generally, one logical replication slot corresponds to one sender and one worker. This is similar to physical replication, where one physical replication slot corresponds to one sender and one receiver.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;1.3 Target-side Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max_worker_processes. Takes effect after restart, default 8. Worker process count limit. Parallel processes (parallel queries, parallel statistics collection, etc., limited by max_parallel_workers), logical replication worker processes (max_logical_replication_workers), and some other programs that need to fork workers are all related to this parameter. It should be set to max_parallel_workers + logical replication apply workers + other background workers.&lt;/li&gt;
&lt;li&gt;max_logical_replication_workers. Takes effect after restart, default 4. Logical replication worker process count, including logical replication apply worker processes and table sync worker processes.&lt;/li&gt;
&lt;li&gt;max_sync_workers_per_subscription. Takes effect after reload, default 2. Sync worker processes when adding new tables to logical replication. Currently, one table has only one parallel.&lt;/li&gt;
&lt;li&gt;The above three parameters are tiered: max_sync_workers_per_subscription &amp;lt; max_logical_replication_workers &amp;lt; max_worker_processes. In short, there must be workers available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;2. Permissions
 &lt;div id="2-permissions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-permissions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Replication user permissions. Logical replication users need replication privileges.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;ALTER ROLE &amp;lt;usename&amp;gt; WITH REPLICATION;&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HBA access restrictions, allowing downstream to access the database using the replication user.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;host lzldb user1 172.17.100.150/32 md5&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For the publish-subscribe model, CREATE permission on the database or superuser permission is needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When creating a publication, for table only, at least the table owner with CREATE permission is needed. All other publications require superuser.&lt;/p&gt;
&lt;p&gt;When creating a subscription, superuser is required.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;grant create on database lzl1db to owner1;&lt;/code&gt; or&lt;/p&gt;
&lt;p&gt;&lt;code&gt;alter user replicate1 superuser;&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Additionally, read or write permissions on tables during replication are also necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Logical Synchronization Between PostgreSQL Instances — Publish and Subscribe
 &lt;div id="logical-synchronization-between-postgresql-instances--publish-and-subscribe" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-synchronization-between-postgresql-instances--publish-and-subscribe" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL&amp;rsquo;s built-in logical replication is based on the publish-subscribe model. The publish-subscribe model does not parse into SQL for application.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Publication
 &lt;div id="publication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;A publisher can have multiple publications, and each publication can have multiple tables.&lt;/li&gt;
&lt;li&gt;When publishing, you can specify:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;for table&lt;/code&gt; — publishes certain tables. New tables need to be explicitly added with ALTER PUBLICATION ADD TABLE. At minimum, the table owner is needed to create this publication.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;for all tables&lt;/code&gt; — publishes all tables under the database. New tables are automatically published. Superuser is required to create this publication.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;for all tables in schema&lt;/code&gt; — publishes all tables under the schema. New tables are automatically published. Superuser is required to create this publication. Supported starting from pg15.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Publications by default include INSERT, UPDATE, DELETE, and TRUNCATE. You can also specify to replicate only certain commands. DDL is not synchronized. (Official documentation verbatim. This means truncate is not considered DDL in PostgreSQL — leaving this as a topic for later research. Truncate is DDL in MySQL and Oracle.)&lt;/li&gt;
&lt;li&gt;Only base tables can be published; temporary tables, foreign tables, views, sequences, etc. cannot be published. Partitioned table publishing is related to PostgreSQL version and partition attributes. pg15 defaults to publishing all partitions of a partitioned table.&lt;/li&gt;
&lt;li&gt;publish_via_partition_root. Supported from pg13. This publication parameter indicates whether partitioned tables use partitions for filtering (false, default) or use the parent partition for row filtering. If set to true, heterogeneous partitioned table logical replication is supported, such as partitioned table to regular table replication. truncate replication is not possible when true.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;Subscription
 &lt;div id="subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;A subscription has only one publisher but can subscribe to multiple publications on the publisher.&lt;/li&gt;
&lt;li&gt;A subscriber can have multiple subscriptions, each receiving data from one replication slot.&lt;/li&gt;
&lt;li&gt;One subscription corresponds to one replication slot, which is on the publisher side.&lt;/li&gt;
&lt;li&gt;When creating or deleting a subscription, the replication slot is automatically created or deleted on the publisher by default.&lt;/li&gt;
&lt;li&gt;Creating a subscription requires superuser.&lt;/li&gt;
&lt;li&gt;DDL is not synchronized; tables must already be created.&lt;/li&gt;
&lt;li&gt;Existing data is synchronized by default, via COPY snapshot to the subscriber.&lt;/li&gt;
&lt;li&gt;Synchronization can be paused and resumed with ALTER SUBSCRIPTION sub1 {ENABLE|DISABLE}.&lt;/li&gt;
&lt;li&gt;When a publication adds new tables, refresh is needed on the subscriber side: alter subscription sub1 refresh publication.&lt;/li&gt;
&lt;li&gt;Schema names, table names, and column names must be consistent between publication and subscription. Column types can differ (as long as implicit conversion succeeds). Column order can be different.&lt;/li&gt;
&lt;li&gt;Subscriptions also have some attributes, such as binary transfer, streaming, synchronous commit, two-phase commit, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/71469ba164fc.png" alt="d48af56aa7fc4df89b429605b2e049a9.png" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;logical replication launcher is used to start the subscriber-side worker processes and only exists at startup.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * IDENTIFICATION
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * src/backend/replication/logical/launcher.c
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NOTES
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This module contains the logical replication worker launcher which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * uses the background worker infrastructure to start the logical
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * replication workers for every enabled subscription.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Publish-Subscribe Related Views
 &lt;div id="publish-subscribe-related-views" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-related-views" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;pg_publication; &amp;ndash; View publications. Publications themselves are stateless; replication slots are stateful, so there&amp;rsquo;s no pg_stat_publication.&lt;/p&gt;
&lt;p&gt;pg_publication_tables &amp;ndash; View published tables, simple and clear.&lt;/p&gt;
&lt;p&gt;pg_publication_rel &amp;ndash; View published tables, all IDs.&lt;/p&gt;
&lt;p&gt;pg_stat_subscription &amp;ndash; View subscription status, pid is the worker process pid.&lt;/p&gt;
&lt;p&gt;pg_subscription &amp;ndash; View subscriptions.&lt;/p&gt;
&lt;p&gt;pg_subscription_rel &amp;ndash; View subscription tables. There&amp;rsquo;s no pg_subscription_tables. Additionally, this view can show the sync status of individual tables under a subscription, which the replication slot view cannot do.&lt;/p&gt;
&lt;p&gt;\dRp list replication publications&lt;/p&gt;
&lt;p&gt;\dRs list replication subscriptions&lt;/p&gt;

&lt;h2 class="relative group"&gt;Creating a Publication and Subscription
 &lt;div id="creating-a-publication-and-subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-a-publication-and-subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Using a dedicated replication user replicate1, create a publication and subscription in the database lzldb to implement logical replication of table trep1.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;code&gt;Role&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Host IP&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Port&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Database&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Schema&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Table&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Replication User&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Version&lt;/code&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;Publisher&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;172.17.100.150&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5410&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;lzldb&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;public&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;trep1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;replicate1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;pg13&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;Subscriber&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;172.17.100.150&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5412&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;lzlbd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;public&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;trep1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;replicate1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;pg13&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;Creating the Publication
 &lt;div id="creating-the-publication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-the-publication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Modify&lt;/span&gt; postgres.conf, wal_level &lt;span style="color:#66d9ef"&gt;parameter&lt;/span&gt; takes effect &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;restart&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_level&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Modify&lt;/span&gt; pg_hba.conf file, takes effect &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; reload
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; lzldb replicate1 &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; md5
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create replication user and grant privileges
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; replicate1 &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; password &lt;span style="color:#e6db74"&gt;&amp;#39;replicate1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; replicate1 &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; lzldb &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the table to be replicated and grant privileges to the replication user
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb replicate1 &lt;span style="color:#75715e"&gt;-- If the replication user is not the table owner, should grant select on trep1 to replicate1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; trep1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create publication, superuser can also be used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb replicate1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; publication pub_lzl1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View publication. \dRp or pg_publication
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_publication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; puballtables &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubinsert &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubupdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubdelete &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubtruncate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubviaroot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------|----------|----------|--------------|-----------|-----------|-----------|-------------|-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pub_lzl1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16392&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Creating the Subscription
 &lt;div id="creating-the-subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-the-subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create table definition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use superuser to create subscription
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SUBSCRIPTION sub_test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CONNECTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5410 dbname=lzldb user=replicate1 password=replicate1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PUBLICATION pub_lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription; &lt;span style="color:#75715e"&gt;-- View subscription. \dRs or pg_subscription
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subdbid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subenabled &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subconninfo &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subslotname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subsynccommit &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subpublications 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------|---------|----------|----------|------------|--------------------------------------------------------------------------------|-------------|---------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;host&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt; port&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; dbname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzldb &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;replicate1 password&lt;span style="color:#f92672"&gt;=&lt;/span&gt;replicate1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;pub_lzl1&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1; &lt;span style="color:#75715e"&gt;-- Verify existing data has been synchronized
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Publish-Subscribe Model Test 1: Truncate Synchronization
 &lt;div id="publish-subscribe-model-test-1-truncate-synchronization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-model-test-1-truncate-synchronization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1; &lt;span style="color:#75715e"&gt;-- In publish-subscribe mode, truncate is synchronized
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Publish-Subscribe Model Test 2: Adding New Table Synchronization
 &lt;div id="publish-subscribe-model-test-2-adding-new-table-synchronization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-model-test-2-adding-new-table-synchronization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Under an existing publish-subscribe, add a new table synchronization. lzldb is publisher, lzlbd is subscriber
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; publication pub_lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; PUBLICATION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After adding a table on the publisher, refresh must be executed on the subscriber. Refresh defaults to synchronizing existing data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; subscription sub_test refresh publication; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SUBSCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription_rel ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; srsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsubstate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsublsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+---------+------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16389&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;F2898
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subscription state codes: i = initializing, d = copying data, s = synchronized, r = ready (normal replication)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- At this point, table tab_pk data has not been synchronized because the subscriber&amp;#39;s replication user lacks query permission on the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription_rel ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; srsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsubstate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsublsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+---------+------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16389&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;F2898
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;D830
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subscription is in ready state, new table synchronization complete&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Replica Identity
 &lt;div id="replica-identity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Replica identity is written into WAL logs to identify a row of data. Whether it&amp;rsquo;s publish-subscribe or third-party logical sync tools, they all need to locate rows in the table to identify which row downstream the update or delete affects.&lt;/p&gt;
&lt;p&gt;PostgreSQL supports 4 replica identity modes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;default(d): Default identity for non-system tables. Uses primary key if the table has one; if no primary key, it&amp;rsquo;s nothing.&lt;/li&gt;
&lt;li&gt;index(i): Uses a non-null unique index as the identity. Must be non-null and unique to identify a row. If only unique, there can be multiple null values. You can also explicitly specify the primary key in index mode.&lt;/li&gt;
&lt;li&gt;full(f): Uses all columns of the row as the identity. Full mode increases WAL log volume.&lt;/li&gt;
&lt;li&gt;nothing(n): Default mode for system tables. No identity; update and delete cannot affect downstream.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View table&amp;#39;s replica identity:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tabname1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When a table&amp;#39;s replica identity is i, check if the index is the replica identity:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d tabname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Modify table replica identity:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; tab1 REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; index_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOTHING&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Replica Identity Test 1: Setting a non-null unique index as replica identity for a table without a primary key
 &lt;div id="replica-identity-test-1-setting-a-non-null-unique-index-as-replica-identity-for-a-table-without-a-primary-key" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-test-1-setting-a-non-null-unique-index-as-replica-identity-for-a-table-without-a-primary-key" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tab_idx&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tab_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unique&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_idx(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- The index used as replica identity must be a non-null unique index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indisreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_1; &lt;span style="color:#75715e"&gt;-- Modify table&amp;#39;s replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indisreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d tab_idx &lt;span style="color:#75715e"&gt;-- pg_index or \d to view index replica identity. \d can only display explicitly modified index replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tab_idx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+-----------+----------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; b &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_1&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNIQUE&lt;/span&gt;, btree (b) REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Replica Identity Test 2: Full mode — can duplicate rows be synchronized normally?
 &lt;div id="replica-identity-test-2-full-mode--can-duplicate-rows-be-synchronized-normally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-test-2-full-mode--can-duplicate-rows-be-synchronized-normally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute the following on the publisher
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_full (a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;)); &lt;span style="color:#75715e"&gt;-- Add table sync without primary key and non-null index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- Insert 5 identical rows
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; publication tab_full &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; PUBLICATION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; subscription sub_test refresh publication; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SUBSCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,2)&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: cannot &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;tab_full&amp;#34;&lt;/span&gt; because it does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; publishes deletes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;To&lt;/span&gt; enable deleting &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,5)&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: cannot &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;tab_full&amp;#34;&lt;/span&gt; because it does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;To&lt;/span&gt; enable updating the &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When the table&amp;#39;s replica identity is d(default), without a primary key it&amp;#39;s nothing. nothing cannot replicate delete and update.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_full replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;full&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,2)&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- After setting replica identity to full, delete succeeds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ; &lt;span style="color:#75715e"&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,5)&amp;#39;&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;ndash; This example proves 3 points:
&amp;ndash; 1. When replica identity is d(default), it defaults to primary key; if no primary key, it&amp;rsquo;s nothing.
&amp;ndash; 2. nothing cannot replicate delete and update.
&amp;ndash; 3. Duplicate data in full mode can also be normally logically replicated. Although the ctid of data rows differs, the replication goal is still achieved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Third-Party Synchronization Software
 &lt;div id="third-party-synchronization-software" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#third-party-synchronization-software" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Third-party synchronization software already has relatively mature solutions and is widely used, such as OGG, DTS, KTL, etc.&lt;/p&gt;
&lt;p&gt;These sync tools are very flexible. They can achieve true heterogeneous synchronization, from PostgreSQL databases to different databases or Kafka, big data consumption platforms, etc.&lt;/p&gt;
&lt;p&gt;Of course, they can also sync from other architecture data platforms to PostgreSQL databases, such as the now common Oracle to PostgreSQL sync scenario.&lt;/p&gt;
&lt;p&gt;Since we&amp;rsquo;re mainly discussing the PostgreSQL database itself, when PostgreSQL acts as the downstream target, it&amp;rsquo;s just some data write issues with very few problems. There won&amp;rsquo;t be logical decoding, replication slot issues, etc. So this small section won&amp;rsquo;t discuss PostgreSQL as a heterogeneous sync target. We&amp;rsquo;ll only observe and summarize scenarios where PostgreSQL acts as the upstream syncing to heterogeneous databases. These third-party tools generally utilize PostgreSQL&amp;rsquo;s own logical decoding, specify their own output plugin, and automatically create replication slots and replication links. Some tools automatically create subscriptions, while others only have replication slots without subscriptions.&lt;/p&gt;
&lt;p&gt;Having already understood logical decoding, output plugins, replication slots, replica identity, and prerequisites for replication, let&amp;rsquo;s simulate a PostgreSQL to Oracle sync by directly configuring the prerequisites and starting synchronization.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Creating OGG Sync from PostgreSQL to Oracle
 &lt;div id="creating-ogg-sync-from-postgresql-to-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-ogg-sync-from-postgresql-to-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Software Installation:&lt;/p&gt;
&lt;p&gt;ogg for oracle: Oracle GoldenGate 21.3.0.0.0 for Oracle on Linux x86-64&lt;/p&gt;
&lt;p&gt;ogg for pg: Oracle GoldenGate 21.3.0.0.0 for PostgreSQL on Linux x86-64&lt;/p&gt;
&lt;p&gt;oracle: 11.2.0.4&lt;/p&gt;
&lt;p&gt;pg: 13.10&lt;/p&gt;
&lt;p&gt;Installation steps:&lt;/p&gt;
&lt;p&gt;OGG installation and deployment won&amp;rsquo;t be introduced here. I followed the article&amp;rsquo;s installation steps step by step. Installation article reference: &lt;a href="https://liuzhilong.blog.csdn.net/article/details/129252320?spm=1001.2014.3001.5502" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/129252320?spm=1001.2014.3001.5502&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sync architecture diagram:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/af57ae30ee4a.png" alt="c8be5aae99704448a8a7e2e01fbde05b.png" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; slot_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;ext_pg_5d4b1d39f7494f79&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;-------+------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ext_pg_5d4b1d39f7494f79
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#75715e"&gt;-- OGG defaults to using test-decoding
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#75715e"&gt;-- As long as OGG extract is running, the replication slot is active
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3509&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;591&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F3E38
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;safe_wal_size &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3509&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt;GoldenGateCapture
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43665&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350469&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; streaming
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sent_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4140
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_priority &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_state &lt;span style="color:#f92672"&gt;|&lt;/span&gt; async
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reply_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;986625&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- replay_lsn has no value
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Even lag has no value&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Logical Replication Monitoring
 &lt;div id="logical-replication-monitoring" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication-monitoring" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;An important method for logical replication lag monitoring is checking lag from the replication software. Without that, you can only check from the replication slot view. The replication slot view provides quite a lot of information, such as whether the replication slot is active directly indicating whether the replication link is syncing.&lt;/p&gt;
&lt;p&gt;The replication slot view is very important for logical replication monitoring. Some additional monitoring for publish-subscribe was introduced earlier. Here we focus on broader logical replication monitoring.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_replication_slots
 &lt;div id="pg_replication_slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_replication_slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The replication slot view shows information about each replication slot and some slot statuses. Manually created slots or slots automatically created by tools and subscriptions are all displayed here.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;slot_name&lt;/th&gt;
 &lt;th&gt;Replication slot name&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;plugin&lt;/td&gt;
 &lt;td&gt;Output plugin name for logical replication slots. If empty, it&amp;rsquo;s a physical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;slot_type&lt;/td&gt;
 &lt;td&gt;physical or logical&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;datoid&lt;/td&gt;
 &lt;td&gt;Database ID for logical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;database&lt;/td&gt;
 &lt;td&gt;Database for logical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;temporary&lt;/td&gt;
 &lt;td&gt;Whether it&amp;rsquo;s a temporary replication slot. Temporary slots are not written to disk and are automatically deleted when the session ends. pg_basebackup uses temporary slots by default&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;active&lt;/td&gt;
 &lt;td&gt;Replication slot status: t or f. If f, you should quickly consider restarting the replication link or deleting it, as it may block WAL log deletion and fill up the primary database disk. This is related to the max_slot_wal_keep_size parameter&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;active_pid&lt;/td&gt;
 &lt;td&gt;walsender PID using this replication slot. Only present when the slot status is t&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;xmin&lt;/td&gt;
 &lt;td&gt;Minimum transaction ID the slot needs to hold&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;catalog_xmin&lt;/td&gt;
 &lt;td&gt;Minimum catalog transaction ID the slot needs to hold&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;restart_lsn&lt;/td&gt;
 &lt;td&gt;LSN position of WAL the slot needs to retain to ensure downstream consumer&amp;rsquo;s required WAL won&amp;rsquo;t be cleaned. max_slot_wal_keep_size parameter is the maximum WAL size the slot needs to retain. Beyond this value, WAL can also be deleted. Default -1 means never cleaned. This value represents the LSN position after the downstream&amp;rsquo;s latest checkpoint consumption and can help locate replication link lag&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;confirmed_flush_lsn&lt;/td&gt;
 &lt;td&gt;LSN confirmed received by the logical replication downstream. Empty for physical replication slots&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_status&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Status of WAL claimed by this replication slot&lt;/code&gt; reserved: the slot reserves WAL, WAL hasn&amp;rsquo;t exceeded max_wal_size (auto-checkpoint interval) extended: the slot reserves WAL, WAL has exceeded max_wal_size but the slot still retains it. WAL in this state is still within wal_keep_size or max_slot_wal_keep_size unreserved: the slot no longer retains needed WAL, WAL will be deleted at next checkpoint lost: WAL needed by the slot has been cleaned, slot is invalid. &lt;code&gt;The last two states are seen only when max_slot_wal_keep_size is non-negative. This is easy to understand, since max_slot_wal_keep_size is the criterion for whether WAL can be deleted. Without a mechanism to delete slot WAL, unreserved and lost states wouldn't appear.&lt;/code&gt; &lt;code&gt;If restart_lsn is NULL, this field is null. Also easy to understand — if there's no WAL LSN, you can't know the WAL retention position or judge whether WAL has exceeded wal_keep_size or max_slot_wal_keep_size.&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;safe_wal_size&lt;/td&gt;
 &lt;td&gt;Number of WAL bytes that can be written before WAL files would be deleted. If this value is negative or zero, it means max_slot_wal_keep_size has been exceeded, and WAL files will be deleted as soon as a checkpoint occurs, requiring the standby using this slot to be rebuilt&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;pg_stat_replication
 &lt;div id="pg_stat_replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Rather than replication status, it&amp;rsquo;s more accurate to call it walsender status. This view shows the status of each walsender, one record per walsender.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If present in pg_replication_slots but not in pg_stat_replication, the walsender is gone; logical replication is down; pg_replication_slots active should be f.&lt;/li&gt;
&lt;li&gt;If absent in pg_replication_slots but present in pg_stat_replication, this is physical replication without a replication slot.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can have replication stat info without a replication slot. Replication slots with walsenders also need this view because it reveals more replication status info than pg_replication_slots.&lt;/p&gt;
&lt;p&gt;So when the replication slot hasn&amp;rsquo;t failed, pg_stat_replication is very important for monitoring logical replication lag.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;pid&lt;/th&gt;
 &lt;th&gt;walsender PID, same as pg_replication_slots active_pid&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;usesysid&lt;/td&gt;
 &lt;td&gt;User OID connected to this walsender, i.e., the downstream&amp;rsquo;s replication user OID&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;usename&lt;/td&gt;
 &lt;td&gt;Username connected to this walsender&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;application_name&lt;/td&gt;
 &lt;td&gt;Downstream application name. If subscription, it&amp;rsquo;s the subscription name. If pg_recvlogical, it&amp;rsquo;s pg_recvlogical&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_addr&lt;/td&gt;
 &lt;td&gt;Downstream IP. If empty, it&amp;rsquo;s a local socket connection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_hostname&lt;/td&gt;
 &lt;td&gt;Downstream hostname&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_port&lt;/td&gt;
 &lt;td&gt;Downstream port. If -1, it&amp;rsquo;s a local socket connection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;backend_start&lt;/td&gt;
 &lt;td&gt;Backend start time, i.e., when downstream connected to walsender&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;backend_xmin&lt;/td&gt;
 &lt;td&gt;Standby&amp;rsquo;s xmin when hot_standby_feedback is enabled. This is clearly for physical replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;state&lt;/td&gt;
 &lt;td&gt;States are relatively easy to understand. startup: walsender starting. catchup: walsender catching up with primary logs. streaming: walsender has caught up with primary logs, normal replication state. backup: walsender sending backup, this state appears for walsender used for backup. stopping: walsender stopping&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sent_lsn&lt;/td&gt;
 &lt;td&gt;LSN sent&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;write_lsn&lt;/td&gt;
 &lt;td&gt;LSN written to disk by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;flush_lsn&lt;/td&gt;
 &lt;td&gt;LSN flushed to disk by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;replay_lsn&lt;/td&gt;
 &lt;td&gt;LSN replayed by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;write_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream write&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;flush_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream flush&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;replay_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream relay&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sync_priority&lt;/td&gt;
 &lt;td&gt;Synchronization priority&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sync_state&lt;/td&gt;
 &lt;td&gt;Synchronization state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;reply_time&lt;/td&gt;
 &lt;td&gt;Last reply time&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Relationship between sent_lsn, write_lsn, flush_lsn, replay_lsn
 &lt;div id="relationship-between-sent_lsn-write_lsn-flush_lsn-replay_lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#relationship-between-sent_lsn-write_lsn-flush_lsn-replay_lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/08ab66a5cd02.png" alt="f2a89e2dabf84e0794c1a5854bb2006f.png" /&gt;&lt;/p&gt;
&lt;p&gt;The above nicely shows the hierarchical relationship of sent_lsn, write_lsn, flush_lsn.&lt;/p&gt;
&lt;p&gt;These monitoring metrics look very much like streaming replication. For logical replication, sent_lsn, write_lsn, flush_lsn also generally have values.&lt;/p&gt;
&lt;p&gt;However, when logical replication doesn&amp;rsquo;t know what the downstream is, the replay log replay action may not exist, so logical replication may not have replay_lsn.&lt;/p&gt;
&lt;p&gt;But one thing is confirmed effective: sent_lsn.&lt;/p&gt;
&lt;p&gt;After reviewing pg_replication_slots and pg_stat_replication view monitoring, we find that neither shows log parsing delay; at most, you can see log transmission delay.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_stat_replication_slots
 &lt;div id="pg_stat_replication_slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_replication_slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;This view has been available since pg14. It specifically monitors logical replication slot status and can additionally monitor spill status. For pg13, you can only check the pg_replslot directory. Spill will be introduced later.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Logical Replication Slot Transaction Snapshots and pg_logical Directory
 &lt;div id="logical-replication-slot-transaction-snapshots-and-pg_logical-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication-slot-transaction-snapshots-and-pg_logical-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The transaction snapshots needed by replication slots are persisted to disk. The source code is in snapbuild.c.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerializationPoint&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildRestore&lt;/span&gt;(builder, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerialize&lt;/span&gt;(builder, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Snap persistence has two behaviors: one is restore, loading from disk to memory; the other is serialize, persisting from memory to disk.&lt;/p&gt;
&lt;p&gt;Transaction snapshot persistence:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerialize&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots/%X-%X.snap&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(uint32) (lsn &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ret &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * somebody else has already serialized to this point, don&amp;#39;t overwrite
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * but remember location, so we don&amp;#39;t need to read old data again.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * To be sure it has been synced to disk after the rename() from the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * tempfile filename to the real filename, we just repeat the fsync.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * That ought to be cheap because in most scenarios it should already
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * be safely on disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(path, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots&amp;#34;&lt;/span&gt;, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;last_serialized_snapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lsn;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;goto&lt;/span&gt; out;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction snapshot loading into memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildRestore&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;==&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots/%X-%X.snap&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(uint32) (lsn &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(path, O_RDONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The transactions needed by logical replication slots, before being committed, store dirty transaction data and unconsumed data under pg_logical/snapshots/. After committing data or starting the replication slot, data is handed to reorderbuffer; or after cleaning the replication slot, the data is released.&lt;/p&gt;
&lt;p&gt;My environment has a long-unused slot with restart_lsn at 0/1776858:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; slot_name,plugin,slot_type,&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;,active,restart_lsn &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; slot_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+---------------+-----------+----------+--------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The oldest snapshot under pg_logical/snapshots/ is it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl snapshots&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:41 0-1776858.snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:44 0-1776900.snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:45 0-1776938.snap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Delete unwanted replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_drop_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After a few minutes, snap is deleted:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#960050;background-color:#1e0010"&gt;@&lt;/span&gt;lzl snapshots]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858.&lt;/span&gt;snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls: cannot access &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858.&lt;/span&gt;snap: No such file or directory&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Logical Decoding Working Memory and Spill to pg_replslot
 &lt;div id="logical-decoding-working-memory-and-spill-to-pg_replslot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-working-memory-and-spill-to-pg_replslot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;logical_decoding_work_mem
 &lt;div id="logical_decoding_work_mem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical_decoding_work_mem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Before pg13, logical decoding would retain at most 4096 changes in memory (max_changes_in_memory hardcoded). Beyond 4096 changes, transaction data would be written to disk.&lt;/p&gt;
&lt;p&gt;pg13 introduced the logical_decoding_work_mem parameter. Working memory used by logical decoding. All walsender decoding uses this shared memory area. If the data held by logical decoding exceeds this memory value, it&amp;rsquo;s written to disk. Logical decoding working memory size defaults to 64MB.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Related ReorderBuffer and Spill
 &lt;div id="related-reorderbuffer-and-spill" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#related-reorderbuffer-and-spill" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Description in reorderbuffer.c:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; This module gets handed individual pieces of transactions in the order
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; toplevel transaction sized pieces. When a transaction is completely
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; reassembled &lt;span style="color:#f92672"&gt;-&lt;/span&gt; signaled by reading the transaction commit record &lt;span style="color:#f92672"&gt;-&lt;/span&gt; it
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; will then call the output &lt;span style="color:#a6e22e"&gt;plugin&lt;/span&gt; (cf. &lt;span style="color:#a6e22e"&gt;ReorderBufferCommit&lt;/span&gt;()) with the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; individual changes. The output plugins rely on snapshots built by
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; snapbuild.c which hands them to us.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When a transaction commits, reorderbuffer can receive transaction entries and sort them, then send data changes to the output plugin for output. The output plugin relies on snapshots built by snapbuild.c, which are handed to reorderbuffer.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Maximum number of changes kept in memory, per transaction. After that,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * changes are spooled to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The current value should be sufficient to decode the entire transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * without hitting disk in OLTP workloads, while starting to spool to disk in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * other workloads reasonably fast.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * At some point in the future it probably makes sense to have a more elaborate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * resource management here, but it&amp;#39;s not entirely clear what that would look
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * like.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; logical_decoding_work_mem;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; Size max_changes_in_memory &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;; &lt;span style="color:#75715e"&gt;/* XXX for restore only */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When parsed data exceeds logical_decoding_work_mem, it&amp;rsquo;s written to disk. max_changes_in_memory is hardcoded at 4096, now only used to trigger disk restore. In pg12 source, there&amp;rsquo;s no int logical_decoding_work_mem, and subsequent serialization was also judged based on max_changes_in_memory.&lt;/p&gt;
&lt;p&gt;In pg13, Disk serialization source code starts from line 2333.
When parsed data in memory exceeds logical_decoding_work_mem, the largest transaction is spilled to disk.
ReorderBufferLargestTXN(rb) finds the largest transaction. ReorderBufferSerializeTXN(rb, txn) persists this transaction.
The immediately following code is ReorderBufferSerializeTXN():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Spill data of a large transaction (and its subtransactions) to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dlist_iter subtxn_i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dlist_mutable_iter change_i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo curOpenSegNo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size spilled &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;spill %u changes in XID %u to disk&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem, txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* do the same to all child TXs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At debug2 level, spill logs are output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Given a replication slot, transaction ID and segment number, fill in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * corresponding spill file into &amp;#39;path&amp;#39;, which is a caller-owned buffer of size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * at least MAXPGPATH.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializedPath&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;path, ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;XLogSegNoOffsetToRecPtr&lt;/span&gt;(segno, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, wal_segment_size, recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, MAXPGPATH, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/xid-%u-lsn-%X-%X.spill&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(MyReplicationSlot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) (recptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Persisted to pg_replslot/replication_slot_name/xid-%u-lsn-%X-%X.spill.&lt;/p&gt;
&lt;p&gt;Similarly, besides serialize, there&amp;rsquo;s also restore:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Restore a number of changes spilled to disk back into memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChanges&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TXNEntryFile &lt;span style="color:#f92672"&gt;*&lt;/span&gt;file, XLogSegNo &lt;span style="color:#f92672"&gt;*&lt;/span&gt;segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size restored &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo last_segno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (restored &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_changes_in_memory &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;segno &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; last_segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; readBytes;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ReorderBufferDiskChange &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ondisk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Read the statically sized part of a change which has information
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * about the total size. If we couldn&amp;#39;t read a record, we&amp;#39;re at the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * end of this file.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeReserve&lt;/span&gt;(rb, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ReorderBufferDiskChange));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readBytes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;FileRead&lt;/span&gt;(file&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vfd, rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;outbuf,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ReorderBufferDiskChange),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * ok, read a full change from disk, now restore it into proper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in-memory format
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChange&lt;/span&gt;(rb, txn, rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;outbuf);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;restored&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; restored;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ReorderBufferRestoreChanges() just does judgment and looping (restored++), calling ReorderBufferRestoreChange():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChange&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Update memory accounting for the restored change. We need to do this
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * although we don&amp;#39;t check the memory limit when restoring the changes in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * this branch (we only do that when initially queueing the changes after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * decoding), because we will release the changes later, and that will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * update the accounting too (subtracting the size from the counters). And
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * we don&amp;#39;t want to underflow there.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferChangeMemoryUpdate&lt;/span&gt;(rb, change, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferChangeSize&lt;/span&gt;(change));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Looking at ReorderBufferRestoreChanges(), its while loop judgment is restored &amp;lt; max_changes_in_memory, and restored starts at 0. It will loop 4096 times. There&amp;rsquo;s a comment in ReorderBufferRestoreChange explaining that although restore isn&amp;rsquo;t based on memory limit, it still needs to update memory usage to prevent underflow. Meaning: since I just restored it, don&amp;rsquo;t spill it again in a nested fashion.
(It feels a bit odd — clearly judging by memory limit would be better rather than hardcoding the restore loop count.)&lt;/p&gt;
&lt;p&gt;Interpreting the logical decoding process based on source code:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b6939610878b.png" alt="69b422c44d6d43e991eea0c8904e166c.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;xtransaction snap preserves the metadata needed for parsing locks. When the replication slot is inactive or the transaction is uncommitted, snap persists to pg_logical/snapshots/%restart_lsn.snap. After the replication slot restarts or the transaction commits, the transaction snap metadata on disk is read into memory and sent to reorderbuffer for WAL parsing, sorted by transaction start order. If logical decoding data fills up the logical_decoding_work_mem memory area, change entries persist the largest transaction to pg_replslot/slot_name/xid-%u-lsn-%X-%X.spill, send other in-memory transactions to the output plugin for format conversion, and finally send the decoded information to the downstream.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In fact, we can see that long transactions and large transactions can make the entire logical replication link very slow. Large transactions are preferentially spilled to disk, then loaded back from disk to memory after the transaction completes.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Logical replication is managed through replication slots: one replication slot, one walsender process, one output plugin.&lt;/li&gt;
&lt;li&gt;The output plugin determines the output form of logically decoded data, specified when creating the replication slot.&lt;/li&gt;
&lt;li&gt;Replica identity priority recommendation: primary key -&amp;gt; non-null unique index -&amp;gt; full.&lt;/li&gt;
&lt;li&gt;The publish-subscribe model is PostgreSQL&amp;rsquo;s built-in logical replication, using pgoutput by default. Publications can be used independently.&lt;/li&gt;
&lt;li&gt;The publisher process is walsender, and the subscriber process is worker. Pay attention to their respective process parameters.&lt;/li&gt;
&lt;li&gt;There are many third-party logical replication tools; they generally use PostgreSQL&amp;rsquo;s logical decoding system.&lt;/li&gt;
&lt;li&gt;For monitoring replication links, pay attention to pg_replication_slots and pg_stat_replication.&lt;/li&gt;
&lt;li&gt;The pg_logical directory stores transaction parsing metadata snaps, waiting for transaction commit before parsing.&lt;/li&gt;
&lt;li&gt;The pg_replslot directory stores transaction information exceeding logical_decoding_work_mem, called spill.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Book: 《PostgreSQL实战》&lt;/p&gt;
&lt;p&gt;Official Documentation:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logicaldecoding.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: Chapter 49. Logical Decoding&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-example.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 49.1. Logical Decoding Examples&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgrecvlogical.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: pg_recvlogical&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/14/view-pg-replication-slots.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 14: 52.81. pg_replication_slots&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/runtime-config-replication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 13: 19.6. Replication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-CALLBACKS" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 13: 48.6. Logical Decoding Output Plugins&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logical-replication-publication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 31.1. Publication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logical-replication-subscription.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 31.2. Subscription&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: CREATE PUBLICATION&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Highly Recommended:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.anayrat.info/en/2018/03/10/logical-replication-internals/" target="_blank" rel="noreferrer"&gt;Logical replication internals | Select * from Adrien&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2019/08/22/an-overview-of-logical-replication-in-postgresql/" target="_blank" rel="noreferrer"&gt;An Overview of Logical Replication in PostgreSQL - Highgo Software Inc.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/4lF4LonDQeICPtbUX_HVnw" target="_blank" rel="noreferrer"&gt;Discussing Logical Decoding from Real Cases&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/yiukiiOa0snzcak1ThmP7Q" target="_blank" rel="noreferrer"&gt;Long-Troubling Logical Decoding Anomalies&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/monitoring-replication-pg_stat_replication/" target="_blank" rel="noreferrer"&gt;Monitoring replication: pg_stat_replication - CYBERTEC&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other References:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://zhuanlan.zhihu.com/p/311496301" target="_blank" rel="noreferrer"&gt;https://zhuanlan.zhihu.com/p/311496301&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dzone.com/articles/postgresql-change-data-capture" target="_blank" rel="noreferrer"&gt;A Guide to PostgreSQL Change Data Capture - DZone&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/change-data-capture-in-postgres-how-to-use-logical-decoding-and/ba-p/1396421" target="_blank" rel="noreferrer"&gt;Change data capture in Postgres: How to use logical decoding and wal2json - Microsoft Community Hub&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.kancloud.cn/taobaomysql/monthly/213790" target="_blank" rel="noreferrer"&gt;PgSQL · The Secrets of PostgreSQL Logical Streaming Replication Technology · Database Kernel Monthly · KanCloud&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/dafei1288/article/details/124629875" target="_blank" rel="noreferrer"&gt;Analyzing PostgreSQL Logical Replication Principles - CSDN Blog&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://pigsty.cc/zh/blog/2021/03/03/postgres" target="_blank" rel="noreferrer"&gt;http://pigsty.cc/zh/blog/2021/03/03/postgres&lt;/a&gt;逻辑复制详解/&lt;/p&gt;
&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-logical" target="_blank" rel="noreferrer"&gt;Logical replication and logical decoding - Azure Database for PostgreSQL - Flexible Server | Microsoft Learn&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Streaming Replication</title><link>https://lastdba.com/en/2024/08/13/postgresql-streaming-replication/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/postgresql-streaming-replication/</guid><description>&lt;h4 class="relative group"&gt;What is PostgreSQL Streaming Replication?
 &lt;div id="what-is-postgresql-streaming-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-postgresql-streaming-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Streaming Replication is a method for transmitting WAL logs introduced in PostgreSQL 9.0. As soon as the primary database generates a log, it is immediately passed to the standby database.
Before PostgreSQL 9.0, PostgreSQL could only transfer WAL logs one at a time (log shipping), and the standby database lagged behind the primary by at least one WAL log.



&lt;img src="https://lastdba.com/img/csdn/973437b5ba70.png" alt="PG Streaming Replication Principle" /&gt;&lt;/p&gt;</description><content:encoded>
&lt;h4 class="relative group"&gt;What is PostgreSQL Streaming Replication?
 &lt;div id="what-is-postgresql-streaming-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-postgresql-streaming-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Streaming Replication is a method for transmitting WAL logs introduced in PostgreSQL 9.0. As soon as the primary database generates a log, it is immediately passed to the standby database.
Before PostgreSQL 9.0, PostgreSQL could only transfer WAL logs one at a time (log shipping), and the standby database lagged behind the primary by at least one WAL log.



&lt;img src="https://lastdba.com/img/csdn/973437b5ba70.png" alt="PG Streaming Replication Principle" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;PostgreSQL Streaming Replication Processes
 &lt;div id="postgresql-streaming-replication-processes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-streaming-replication-processes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;wal sender&lt;/strong&gt;: The wal sender exists on the primary database. The wal sender process transmits the WAL between the primary&amp;rsquo;s latest LSN and the standby&amp;rsquo;s latest LSN to the standby.
&lt;strong&gt;wal receiver&lt;/strong&gt;: The wal receiver exists on the standby database. The wal receiver process transmits the standby&amp;rsquo;s latest LSN to the primary. The wal receiver receives WAL data passed by the wal sender and writes it to WAL logs.
&lt;strong&gt;startup&lt;/strong&gt;: The standby instance recovery process. It replays WAL logs on the standby database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;16776&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14632&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 13:33 ? 00:00:00 postgres: wal sender process lzl 172.17.100.150&lt;span style="color:#f92672"&gt;(&lt;/span&gt;13338&lt;span style="color:#f92672"&gt;)&lt;/span&gt; streaming 0/3002D30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;16775&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15329&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 13:33 ? 00:00:00 postgres: wal receiver process streaming 0/3002D30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;15330&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15329&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 10:26 ? 00:00:00 postgres: startup process recovering &lt;span style="color:#ae81ff"&gt;000000010000000000000003&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;PostgreSQL Streaming Replication Principles
 &lt;div id="postgresql-streaming-replication-principles" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-streaming-replication-principles" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL streaming replication is primarily divided into two phases: the instance recovery phase and the primary-standby synchronization phase.
&lt;strong&gt;Instance Recovery Phase&lt;/strong&gt;: When a PostgreSQL database crashes abnormally, upon startup, PostgreSQL replays all WAL logs after the last checkpoint before the crash (this is the same principle as instance recovery in Oracle, MySQL, and other relational databases — the goal is to bring the database to a consistent state). When setting up a PostgreSQL standby database, the primary is generally not shut down. At this point, the backup taken from the primary is in an inconsistent state, and the startup process performs instance recovery when the standby starts.
&lt;strong&gt;Primary-Standby Synchronization Phase&lt;/strong&gt;: The wal receiver process transmits the standby&amp;rsquo;s latest LSN to the primary. The wal sender transmits the WAL between the primary&amp;rsquo;s latest LSN and the standby&amp;rsquo;s latest LSN to the wal receiver. The wal receiver receives the WAL and writes it to disk, and the startup process replays the WAL logs on the standby.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Synchronous and Asynchronous
 &lt;div id="synchronous-and-asynchronous" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#synchronous-and-asynchronous" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL primary-standby has 5 modes, controlled by the &lt;code&gt;synchronous_commit&lt;/code&gt; parameter. The essence of the &lt;code&gt;synchronous_commit&lt;/code&gt; parameter is to control when the primary commits.
&lt;strong&gt;remote_apply&lt;/strong&gt;: The primary commits only after all standby databases have applied the WAL. This mode is synchronous — the primary and standby are consistent. Data that can be queried on the primary can definitely also be queried on the standby. In this mode there is no primary-standby lag, but it affects the primary commit time because the primary commit needs to wait for network transmission and standby application time.&lt;/p&gt;
&lt;p&gt;The meaning of synchronous_commit has two scenarios: with and without standby databases (when synchronous_standby_names is empty or non-empty):&lt;/p&gt;
&lt;p&gt;When synchronous_standby_names is non-empty:
&lt;strong&gt;remote_apply&lt;/strong&gt;: The standby has applied the WAL, only then can the primary commit. In this mode the primary and standby are synchronous.
&lt;strong&gt;on&lt;/strong&gt;: default. The primary commits when both primary and standby WAL have been written to disk. Similar to semi-synchronous, no data will be lost.
&lt;strong&gt;remote_write&lt;/strong&gt;: The primary commits when the standby has received the WAL and written the WAL log to the filesystem cache. At this point the standby has received the WAL but hasn&amp;rsquo;t flushed it to disk yet. If the OS crashes, data will be lost.
&lt;strong&gt;local&lt;/strong&gt;: The primary commits when its WAL is flushed to disk. This mode is asynchronous — the primary doesn&amp;rsquo;t need to confirm the standby&amp;rsquo;s status before committing.
&lt;strong&gt;off&lt;/strong&gt;: The primary can commit without its own WAL being flushed to disk. There is a risk of data loss. Not recommended.&lt;/p&gt;
&lt;p&gt;When synchronous_standby_names is empty:
(When synchronous_standby_names is empty, only on and off are effective for synchronous_commit. If set to remote_apply, remote_write, or local, they are still treated as on.)
&lt;strong&gt;on&lt;/strong&gt;: default. The database WAL must be written to disk before a transaction can commit.
&lt;strong&gt;off&lt;/strong&gt;: The primary can commit without its own WAL being flushed to disk. There is a risk of data loss. Not recommended.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary-Standby Synchronization Relationship&lt;/strong&gt;



&lt;img src="https://lastdba.com/img/csdn/43bd95ea31d5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary-Standby Reliability&lt;/strong&gt;



&lt;img src="https://lastdba.com/img/csdn/647bc630a1ef.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Failover
 &lt;div id="failover" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#failover" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When the primary crashes, the standby needs to initiate failover, at which point the standby becomes the new primary. PostgreSQL does not provide a method to detect failures, but it does provide a method to activate the primary. (Typically, third-party tools call the PostgreSQL activation method, while primary-standby monitoring, primary crash detection, connection switching, etc. are not handled by PostgreSQL itself.)
PostgreSQL provides 2 methods to activate a standby as the primary: the trigger_file file and the pg_ctl promote command. (In PostgreSQL 12 and later, trigger_file becomes promote_trigger_file.)
Both trigger_file and pg_ctl promote can complete the task of activating the standby with a single command. The difference is that trigger_file requires the trigger_file configuration to be written in recovery.conf in advance.
Using trigger_file for primary-standby switchover (pg_ctl promote has the same effect and is simpler):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure trigger_file in the standby&amp;rsquo;s recovery.conf&lt;/li&gt;
&lt;li&gt;Shut down the primary&lt;/li&gt;
&lt;li&gt;touch trigger_file to start the old standby as the new primary&lt;/li&gt;
&lt;li&gt;Configure recovery.conf to start the old primary as the new standby&lt;/li&gt;
&lt;li&gt;Observe the new and old primary/standby databases&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Failover Example:&lt;/strong&gt;
Environment:
Primary	172.17.100.150	5432
Standby	172.17.100.150	5433&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Configure trigger_file in standby recovery.conf&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat recovery.conf|grep trigger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;trigger_file &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;/pg/pg96data_sla/trigger.kenyon&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ll /pg/pg96data_sla/trigger.kenyon
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls: cannot access /pg/pg96data_sla/trigger.kenyon: No such file or directory&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply configure the trigger file path in recovery.conf. The trigger file won&amp;rsquo;t appear until it&amp;rsquo;s created.&lt;/p&gt;
&lt;p&gt;Add configuration to standby postgres.conf&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_senders &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#75715e"&gt;#max_wal_senders is the maximum number of sender processes, default is 0, so the standby must configure this before switchover&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hot_standby&lt;span style="color:#f92672"&gt;=&lt;/span&gt;on &lt;span style="color:#75715e"&gt;#Enable query functionality on standby&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Shut down the primary&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl stop -D /pg/pg96data_pri -m fast
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to shut down.... &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server stopped&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(Check if primary WAL has been fully applied by the standby: pg9.6- cd pg_xlog; pg 10+ cd pg_wal)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls -ltr|tail -n &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $NF}&amp;#39;&lt;/span&gt;|&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; read xlog;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; pg_xlogdump $xlog;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Look for the keyword &amp;ldquo;shutdown&amp;rdquo; in the standby&amp;rsquo;s WAL&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. touch to activate standby (or pg_ctl promote -D /pg/pg96data_sla)&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ touch /pg/pg96data_sla/trigger.kenyon&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point recovery.conf becomes recovery.done&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Set up primary as standby&lt;/strong&gt;
Configure the new standby&amp;rsquo;s recovery.conf file. You can directly copy from the old standby and modify the IP and directory.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $新备库/recover.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;standby_mode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;primary_conninfo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5433 user=lzl password=lzl&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recovery_target_timeline &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;latest&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Configure postgres.conf, write hot_standby = on to enable queries on the standby&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $新备库/postgres.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hot_standby &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Start the new standby&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg96/bin/pg_ctl -D /pg/pg96data_pri -l /pg/pg96data_pri/server.log start&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;5. Check primary and standby&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# \x&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Expanded display is on.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# select * from pg_stat_replication ;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-&lt;span style="color:#f92672"&gt;[&lt;/span&gt; RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;]&lt;/span&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid | &lt;span style="color:#ae81ff"&gt;24766&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid | &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename | lzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name | walreceiver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr | 172.17.100.150
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port | &lt;span style="color:#ae81ff"&gt;47345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start | 2021-07-30 07:44:05.582546+00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state | streaming
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sent_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_priority | &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_state | async&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;pg_basebackup
 &lt;div id="pg_basebackup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_basebackup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;pg_basebackup is PostgreSQL&amp;rsquo;s built-in backup tool for performing base backups. pg_basebackup can be used for PITR and also for constructing log-shipping standby and streaming standby. It is PostgreSQL&amp;rsquo;s physical backup tool.
&lt;a href="https://liuzhilong.blog.csdn.net/article/details/119533506" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/119533506&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_rewind
 &lt;div id="pg_rewind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_rewind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;pg_rewind can be used as a maintenance tool for PostgreSQL primary-standby setups. When the timelines of two PostgreSQL instances diverge, pg_rewind can synchronize between the instances. (For example, if the standby is running after failover while the primary was still running, the timelines of primary and standby will have diverged.)
&lt;a href="https://liuzhilong.blog.csdn.net/article/details/119250794" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/119250794&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Replication Slots
 &lt;div id="replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What are PostgreSQL Replication Slots?&lt;/strong&gt;
In a primary-standby architecture, if the standby hasn&amp;rsquo;t received WAL logs yet but the primary has already deleted them, such lag cannot be automatically recovered. Replication slots ensure that the primary won&amp;rsquo;t delete WAL logs that haven&amp;rsquo;t been transmitted to the standby yet.
Without replication slots, you might need to use wal_keep_size/wal_keep_segments and archive_command to ensure WAL logs aren&amp;rsquo;t deleted, but this approach always retains too many WAL files and cannot guarantee that WAL won&amp;rsquo;t be deleted when lag is significant. This is exactly why replication slots were created.
However, replication slots may cause the primary to never delete WAL (e.g., if the standby has crashed), causing disk space to fill up. In this case, max_slot_wal_keep_size is needed to set an upper limit on WAL file retention.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Replication Slot Parameters:&lt;/strong&gt;
&lt;strong&gt;max_slot_wal_keep_size&lt;/strong&gt;: When replication slots are in use, this parameter defines the maximum size of WAL files in the pg_wal directory. The default value is -1, meaning there is no upper limit on the size of WAL files retained by the primary for the standby.
&lt;strong&gt;wal_keep_segments&lt;/strong&gt;/&lt;strong&gt;wal_keep_size&lt;/strong&gt;: PostgreSQL 12 and below use wal_keep_segments, PostgreSQL 13 and above use wal_keep_size. Ensures that WAL files under pg_wal are not deleted. Without replication slots, WAL files exceeding this size may be deleted, potentially causing the standby to be unable to catch up. If set too large, it may cause the directory to grow excessively. The default is 0, meaning WAL files are not retained. If WAL is deleted, the following error may occur:
&lt;code&gt;ERROR: requested WAL segment xxxx has already been removed&lt;/code&gt;
At this point the standby can only hope for archives; otherwise, it must be rebuilt.
&lt;strong&gt;primary_slot_name&lt;/strong&gt;: Sets the slot name, indicating that the PostgreSQL primary-standby setup uses replication slots. So enabling PostgreSQL replication slots requires at least the following configuration:
primary_conninfo = &amp;lsquo;host=172.17.100.150 port=5433 user=lzl password=lzl&amp;rsquo;
primary_slot_name = &amp;lsquo;pg_slot_lzl&amp;rsquo;
&lt;strong&gt;max_replication_slots&lt;/strong&gt;: The maximum number of replication slots. Takes effect upon restart. If there aren&amp;rsquo;t enough replication slots, the standby will fail to start. This value should be set relatively high. In PostgreSQL versions below 9.6, the default is 0; in PostgreSQL 10 and above, it&amp;rsquo;s 10.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Creating PostgreSQL Replication Slots&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Set max_replication_slots on the primary&lt;/strong&gt;
Primary: (my PostgreSQL version is 9.6)
max_replication_slots=10
Add to postgres.conf and restart the primary&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Create replication slot&lt;/strong&gt;
Create replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_create_physical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;pg_slot_lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xlog_position 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View replication slot&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; slot_name, slot_type, active &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; physical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Set primary_slot_name on the standby&lt;/strong&gt;
&lt;code&gt;primary_slot_name = 'pg_slot_lzl'&lt;/code&gt;
Add to recovery.conf and restart the standby&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Check replication slot&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;,pg_xlogfile_name(restart_lsn)&lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; current_xxlog &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; current_xxlog 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; physical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12802&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A002340 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00000002000000000000000&lt;/span&gt;A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--pg_xlogfile_name(restart_lsn) to view current WAL log info&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Query Conflicts
 &lt;div id="query-conflicts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#query-conflicts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;What are Query Conflicts?&lt;/strong&gt;
The standby may encounter the following error during queries:
&lt;code&gt;ERROR：canceling statement due to conflict with recovery&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Why do conflicts occur? Let&amp;rsquo;s think carefully. For example, if the standby is executing a query based on a certain table (this query could be from an application or a manual connection), and the primary executes a drop table operation, this operation is written to WAL logs and transmitted to the standby for application. To ensure data consistency, PostgreSQL will inevitably replay the data quickly, at which point the drop table and select will conflict, as shown below:



&lt;img src="https://lastdba.com/img/csdn/2d333af63baa.png" alt="Query conflict during DDL" /&gt;&lt;/p&gt;
&lt;p&gt;Conflict scenarios:
The above only introduces one type of query conflict. To summarize, there are several situations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Primary exclusive locks (including explicit LOCK commands and various DDL operations)&lt;/li&gt;
&lt;li&gt;Primary vacuum cleaning up dead tuples — if the standby is using those tuples, a conflict will occur&lt;/li&gt;
&lt;li&gt;Primary drops the tablespace that the standby query is using&lt;/li&gt;
&lt;li&gt;Primary drops the database that the standby is using



&lt;img src="https://lastdba.com/img/csdn/2194edd3e8af.png" alt="Query conflict during vacuum" /&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Consider a primary-only scenario:
Scenario 1: A session issues a drop table and finds that a select statement is currently executing. The session can only wait for the select to complete its transaction.
Scenario 2: A session issues a vacuum or automatic background vacuum — it won&amp;rsquo;t conflict with current database queries because vacuum won&amp;rsquo;t clean up tuples that are in use.&lt;/p&gt;
&lt;p&gt;The standby&amp;rsquo;s handling is different. Because the primary doesn&amp;rsquo;t know the standby&amp;rsquo;s transaction status, and the standby needs to stay consistent with the primary, this is why &amp;ldquo;query conflicts&amp;rdquo; occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Query Conflict Parameters&lt;/strong&gt;
&lt;strong&gt;hot_standby_feedback:&lt;/strong&gt;
This is the most frequently mentioned parameter in the topic of query conflicts. Let&amp;rsquo;s explore it in detail below. Suppose, without a standby, Session 1 queries a row of data, Session 2 deletes that data and commits. Then Session 2 performs a vacuum. We know this vacuum won&amp;rsquo;t delete that row because Session 1&amp;rsquo;s transaction still needs to use that tuple, so it won&amp;rsquo;t be cleaned up. What about in a primary-standby setup? How does the primary know that the standby is still querying when it&amp;rsquo;s about to perform a vacuum? This is the purpose of this parameter. After setting hot_standby_feedback, the standby will periodically notify the primary of the minimum active transaction ID (xmin) value, so the primary vacuum process won&amp;rsquo;t clean up tuples with values greater than xmin.
This parameter helps reduce conflicts but cannot completely avoid them. If you think about it carefully, this parameter only reduces conflicts caused by the primary vacuuming dead tuples — it cannot resolve conflicts caused by exclusive locks. Or conflicts caused by network interruptions: if the network between primary and standby is interrupted, the standby cannot send the xmin value to the primary normally. If the interruption is long enough, the primary will still clean up useless tuples during this period, and after the network recovers, the vacuum conflict described above may occur.
It&amp;rsquo;s worth noting that the hot_standby_feedback parameter won&amp;rsquo;t override the value limited by the old_snapshot_threshold parameter on the primary. The old_snapshot_threshold parameter limits the infinite expansion of dead tuples. When transaction information exceeds the old_snapshot_threshold limit, cleanup will still occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;max_standby_streaming_delay:&lt;/strong&gt;
The waiting time before the standby cancels a query due to a conflict caused by receiving WAL stream logs. Setting this parameter means that when a conflict occurs, the standby query won&amp;rsquo;t be immediately canceled but will wait for a period before throwing an error if it hasn&amp;rsquo;t finished. The value can be set based on the expected runtime of potential long transactions on the standby.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;max_standby_archive_delay:&lt;/strong&gt;
The waiting time before the standby cancels a query due to a conflict caused by processing archived WAL logs. Similar to the parameter above.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum_defer_cleanup_age:&lt;/strong&gt;
Specifies the number of transactions by which vacuum delays cleaning up dead tuples. Vacuum will delay clearing invalid records. The number of deferred transactions is set through vacuum_defer_cleanup_age. That is, vacuum and vacuum full operations won&amp;rsquo;t immediately clean up recently deleted tuples.&lt;/p&gt;
&lt;p&gt;You can view conflict occurrences through the pg_stat_database and pg_stat_database_conflicts views.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Other Related Parameters
 &lt;div id="other-related-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-related-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Transmission Parameters&lt;/strong&gt;
&lt;strong&gt;max_wal_senders&lt;/strong&gt;: The maximum number of services that can fetch WAL using wal sender, i.e., the maximum number of standby databases + basebackup clients. PostgreSQL 9.6 defaults to 0; PostgreSQL 10 and later default to 10.
&lt;strong&gt;wal_send_timeout&lt;/strong&gt;: Interrupt replication after WAL transmission fails for xx seconds. When the standby crashes or the network is interrupted for a long time, WAL will no longer attempt transmission. Default is 60. 0 means never interrupt replication.
&lt;strong&gt;track_commit_timestamp&lt;/strong&gt;: Record transaction timestamps. Default is off.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Parameters&lt;/strong&gt;
&lt;strong&gt;synchronous_standby_names&lt;/strong&gt;:
Configured on the primary. The standby replication list. There are several forms (s1, s2, s3 represent the standby&amp;rsquo;s application_name, configured in recovery.conf):
synchronous_standby_names=&amp;lsquo;s1&amp;rsquo; means the primary can commit when s1 standby returns.
synchronous_standby_names=&amp;lsquo;FIRST 2 (s1,s2,s3)&amp;rsquo; means the primary can commit when the first two of the three standbys (s1 and s2) return.
synchronous_standby_names=&amp;lsquo;ANY 2 (s1,s2,s3)&amp;rsquo; means the primary can commit when any two of the three standbys return.
synchronous_standby_names=&amp;rsquo;&lt;em&gt;&amp;rsquo; means matching any host — the primary can commit when any host returns.
&lt;strong&gt;wal_level&lt;/strong&gt;:
WAL log level. This parameter determines how much information is written to WAL logs. The default is replica, which supports replication and WAL archiving while also supporting standby read-only queries.
minimal: Other than records needed for instance crash recovery, nothing else is recorded. For example, CREATE TABLE AS, CREATE INDEX, CLUSTER, COPY can be skipped. The log information recorded in this mode is insufficient to support WAL archiving and streaming replication.
logical: Adds additional information on top of replica to support logical decoding. This mode increases WAL log volume, especially for databases with many UPDATE and DELETE operations.
Before PostgreSQL 9.6, there were also archive and hot_standby modes, which map to the current replica mode.
&lt;strong&gt;synchronous_commit&lt;/strong&gt;:
As discussed earlier, 5 modes, each with pros and cons.
&lt;strong&gt;archive_mode&lt;/strong&gt;: archive_mode = on enables archiving.
&lt;strong&gt;archive_command&lt;/strong&gt;: Archiving command. PostgreSQL archiving directly calls operating system commands. Can be a simple cp command to the backup side.
&lt;strong&gt;listen_addresses&lt;/strong&gt;: Listening addresses. &amp;lsquo;&lt;/em&gt;&amp;rsquo; means listen on all IPs. Default is local.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Standby Parameters&lt;/strong&gt;
&lt;strong&gt;hot_standby&lt;/strong&gt;: on enables standby read-only queries.
&lt;strong&gt;primary_conninfo&lt;/strong&gt;: The connection string for the standby to connect to the primary. E.g., primary_conninfo = &amp;lsquo;host=172.17.100.150 port=5432 user=lzl password=lzl&amp;rsquo;.
&lt;strong&gt;trigger_file/promote_trigger_file&lt;/strong&gt;: The trigger file for activating the standby. Before PostgreSQL 12 it&amp;rsquo;s called trigger_file; PostgreSQL 12 and later use promote_trigger_file.
Both trigger_file and pg_ctl promote can activate the standby with a single command, as demonstrated earlier.
&lt;strong&gt;wal_receiver_create_temp_slot&lt;/strong&gt;: When there is no slot, temporarily create one (named after primary_slot_name). Default is off.&lt;/p&gt;

&lt;h4 class="relative group"&gt;References:
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;《The Way of PostgreSQL》(修炼之道)
&lt;a href="https://www.postgresql.org/docs/current/warm-standby.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/warm-standby.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/high-availability.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/high-availability.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/runtime-config-replication.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/runtime-config-replication.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/runtime-config-wal.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/runtime-config-wal.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/app-pgbasebackup.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.tencent.com/developer/article/1555354" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/1555354&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/29737" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/29737&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Streaming_Replication" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Streaming_Replication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/the-synchronous_commit-parameter/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/the-synchronous_commit-parameter/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/m15217321304/article/details/88850146" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/m15217321304/article/details/88850146&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.51cto.com/lishiyan/2460518?source=dra" target="_blank" rel="noreferrer"&gt;https://blog.51cto.com/lishiyan/2460518?source=dra&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of Linux Memory</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-linux-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-linux-memory/</guid><description>&lt;h2 class="relative group"&gt;Basic Memory Concepts
 &lt;div id="basic-memory-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-memory-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Operating system memory is very important and fairly complex. Many knowledge points need to be mastered to further analyze program issues. Since this is the first comprehensive and systematic exposure to OS memory, the goal is to understand Linux memory concepts thoroughly and at a low level without diving deep into principles, so this chapter will also try to avoid Linux source code knowledge.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Basic Memory Concepts
 &lt;div id="basic-memory-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-memory-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Operating system memory is very important and fairly complex. Many knowledge points need to be mastered to further analyze program issues. Since this is the first comprehensive and systematic exposure to OS memory, the goal is to understand Linux memory concepts thoroughly and at a low level without diving deep into principles, so this chapter will also try to avoid Linux source code knowledge.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Memory and Virtual Memory
 &lt;div id="physical-memory-and-virtual-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-memory-and-virtual-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e9d3726e966d.png" alt="Insert image description" /&gt;
(&lt;a href="https://en.wikipedia.org/wiki/Memory_address" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Memory_address&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Physical Memory&lt;/strong&gt;: Physical memory is the actual hardware memory present in a computer system, typically in the form of RAM (Random Access Memory).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Virtual Memory&lt;/strong&gt;: Virtual memory is a linear region that has not been allocated actual physical memory. Programs think they have a larger address space than the actual physical memory. The implementation of virtual memory allows programs to access a larger address range than physical memory without requiring all data to be present in physical memory simultaneously. The kernel releases physical pages by releasing linear regions, finding the corresponding physical pages, and releasing them all.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Memory Management Unit (MMU)&lt;/strong&gt;: A hardware component responsible for converting virtual addresses used by programs into physical addresses where data is actually stored in physical memory. The MMU&amp;rsquo;s primary task is to perform address mapping.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Page Table&lt;/strong&gt;: A page table is a data structure used to store the mapping between virtual address space and physical address space. When a program attempts to access virtual memory, the MMU determines the corresponding physical address by querying the page table.&lt;/p&gt;
&lt;p&gt;System call flow:



&lt;img src="https://lastdba.com/img/csdn/b1b0da7b7d74.png" alt="Insert image description" /&gt;
&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;(The image is a bit blurry, the topmost text is &amp;ldquo;User Space|Kernel Space&amp;rdquo;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User programs can only access the kernel system through C libraries or system calls; user programs cannot directly access the kernel system&lt;/li&gt;
&lt;li&gt;The kernel system accesses physical memory through the MMU; it accesses disks and other external devices through drivers&lt;/li&gt;
&lt;li&gt;The virtual memory system (VM Subsystem in the figure above) includes buddy, slab algorithms, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;User Space and Kernel Space
 &lt;div id="user-space-and-kernel-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#user-space-and-kernel-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process virtual address space is divided into user space and kernel space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User Space&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The space where user processes run in memory&lt;/li&gt;
&lt;li&gt;This portion of space is protected, and the system prevents other processes from accessing it (except for shared memory)&lt;/li&gt;
&lt;li&gt;However, kernel processes can directly access user processes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Kernel Space&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kernel space is the space used by kernel processes&lt;/li&gt;
&lt;li&gt;In kernel space, the operating system&amp;rsquo;s kernel code runs with higher privilege levels, allowing direct access to system hardware, process management, file system operations, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Context Switching:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When a user program needs to access system services or perform operations requiring higher privileges, a context switch from user space to kernel space is triggered.&lt;/li&gt;
&lt;li&gt;Context switching is an operating system mechanism for saving and restoring program state, ensuring no data loss occurs when switching between user programs and the kernel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The division between user space and kernel space is to provide security isolation, preventing user programs from directly affecting critical parts of the operating system. Early operating systems and DOS did not distinguish between kernel and user space, so a single program&amp;rsquo;s error or malicious behavior could affect the entire system.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4b446b757f77.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.zhihu.com/tardis/zm/art/66794639?source_id=1003" target="_blank" rel="noreferrer"&gt;https://www.zhihu.com/tardis/zm/art/66794639?source_id=1003&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;32-bit systems: Total 4GB address space, 3G UserSpace | 1G KernelSpace&lt;/p&gt;
&lt;p&gt;64-bit systems: Total 256TB address space, 128T UserSpace | 128T KernelSpace&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2^32=4GB, 2^64=16777216TB, why does a 64-bit system only have 256TB address space?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/64-bit_computing" target="_blank" rel="noreferrer"&gt;64-bit computing wiki&lt;/a&gt; has an explanation. In short, 256TB (256 × 1024^4 bytes) of memory addresses is sufficient, and currently and in the imaginable future there won&amp;rsquo;t be 16EB (16 × 1024^6 bytes) of memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Process Virtual Address Space
 &lt;div id="process-virtual-address-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-virtual-address-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each process typically has its own independent virtual memory space. Virtual memory is an abstract concept that provides each running process with an address space that appears continuous and private, making each process feel like it has the entire computer system&amp;rsquo;s full memory.&lt;/p&gt;
&lt;p&gt;Process virtual address space layout:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/94df008e9d4a.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2bc35848088f.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.sohu.com/a/392831824_467784" target="_blank" rel="noreferrer"&gt;https://www.sohu.com/a/392831824_467784&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The mmap mapping region expands from top to bottom, and the mmap mapping region and heap expand relative to each other until the remaining area in the virtual address space is exhausted. This structure facilitates the C runtime library&amp;rsquo;s use of the mmap mapping region and heap for memory allocation.&lt;/li&gt;
&lt;li&gt;Stack: Stores local variables and function parameters during program execution, growing from high addresses to low addresses&lt;/li&gt;
&lt;li&gt;Heap: Dynamic memory allocation area, managed through functions like malloc, new, free, and delete&lt;/li&gt;
&lt;li&gt;BSS (Uninitialized Variables): Stores uninitialized global variables and static variables&lt;/li&gt;
&lt;li&gt;Data: Stores global variables and static variables with predefined values in source code&lt;/li&gt;
&lt;li&gt;Text: Stores read-only program execution code, i.e., machine instructions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Process virtual address space distribution and mapping:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/92827b8dcc73.png" alt="Insert image description" /&gt;
(&lt;a href="https://velog.io/@mysprtlty/%EA%B0%80%EC%83%81-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%99%80-%EA%B0%80%EC%83%81-%EC%A3%BC%EC%86%8C-%EA%B3%B5%EA%B0%84" target="_blank" rel="noreferrer"&gt;https://velog.io/@mysprtlty/%EA%B0%80%EC%83%81-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%99%80-%EA%B0%80%EC%83%81-%EC%A3%BC%EC%86%8C-%EA%B3%B5%EA%B0%84&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As mentioned earlier, the user space in the virtual address space cannot be accessed by other user processes. If multi-process user access to the same memory data is implemented through the kernel area, context switching cannot be avoided. Multi-process applications clearly need inter-process access, so a method that directly allows user processes to access the same physical memory emerged — this is shared memory.&lt;/p&gt;
&lt;p&gt;Shared memory is one of the mechanisms for implementing IPC (Inter Process Communication), with other methods including message queues and semaphores.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d969a23e8ba9.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.geeksforgeeks.org/inter-process-communication-ipc/" target="_blank" rel="noreferrer"&gt;https://www.geeksforgeeks.org/inter-process-communication-ipc/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Since it is inherently multiple virtual memory address spaces corresponding to one physical memory address space, you just need to point a segment in the address spaces of two processes to the same physical memory.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1bc1a1357c78.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.softprayog.in/programming/interprocess-communication-using-system-v-shared-memory-in-linux" target="_blank" rel="noreferrer"&gt;https://www.softprayog.in/programming/interprocess-communication-using-system-v-shared-memory-in-linux&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Shared memory (seems like) has many implementation methods. For example, PostgreSQL defaults to using mmap to implement shared memory, refer to the &lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE" target="_blank" rel="noreferrer"&gt;shared_memory_type parameter&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/kernel-resources.html" target="_blank" rel="noreferrer"&gt;Managing Kernel Resources&lt;/a&gt;. Other shared memory implementations can be found in this article: &lt;a href="https://cloud.tencent.com/developer/article/1551288" target="_blank" rel="noreferrer"&gt;Song Baohua: The Best Shared Memory in the World (The Most Thorough Linux Shared Memory Article)&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Page Table
 &lt;div id="page-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process virtual address space is per-process, while there is only one physical memory space. So how do you map and convert virtual memory and shared memory?&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/02d5376a22ed.png" alt="Insert image description" /&gt;
(&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The page table is where the correspondence between virtual memory addresses and physical memory addresses is stored.&lt;/strong&gt; (There are concepts like MMU and TLB here — let&amp;rsquo;s simplify and just think of it as the virtual-to-physical memory conversion function (PAGING), and only look at the page table here). A page table consists of a set of Page Table Entries (PTEs), with each PTE storing the map between a virtual page and a physical page.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3e31e4f9f0eb.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Although a single page table can implement memory-to-virtual-memory conversion, implementing it directly this way would consume too much memory for the page table itself.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4e25eb557be1.png" alt="Insert image description" /&gt;
(&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Therefore, the single page table needs to be subdivided: two-level page tables and four-level page tables.&lt;/p&gt;
&lt;p&gt;Two-level page tables:&lt;/p&gt;
&lt;p&gt;A two-level page table is a further subdivision of a single page table. 4G of space requires 4M of page tables to store the mapping table. If these 4M are divided into 1K pages (4K each), these 1K pages also need a table for management, which we call the &lt;strong&gt;page directory table&lt;/strong&gt;. This page directory table has 1K entries, each 4 bytes, making the page directory table size 4K as well.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/527ca245cbef.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Four-level page tables:&lt;/p&gt;
&lt;p&gt;For 64-bit systems, two-level page tables are insufficient; four-level page tables are needed.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/aec3c7ac7449.png" alt="Insert image description" /&gt;
(&lt;a href="https://maodanp.github.io/2019/06/02/linux-virtual-space/" target="_blank" rel="noreferrer"&gt;https://maodanp.github.io/2019/06/02/linux-virtual-space/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Check page table size:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat /proc/meminfo |grep PageTables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PageTables: &lt;span style="color:#ae81ff"&gt;46736&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;NUMA
 &lt;div id="numa" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#numa" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Uniform Memory Access (UMA)&lt;/strong&gt;: All CPUs have equivalent access time to memory. The problem with UMA is that multiple processors access memory through a single bus, increasing the load on the shared bus. Multiple processors contend for the memory controller causing conflicts. Additionally, the bus bandwidth is limited, leading to access delays.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Non-Uniform Memory Access (&lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa" target="_blank" rel="noreferrer"&gt;NUMA&lt;/a&gt;)&lt;/strong&gt;: A small group of CPUs access their own local memory together. When there are multiple groups of CPUs and their memory groups, each group of CPUs and memory constitutes a NUMA node.&lt;/p&gt;
&lt;p&gt;UMA:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8aa08bee1125.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;NUMA:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5c9b2c4ad417.png" alt="Insert image description" /&gt;
(&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Basic NUMA characteristics&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU access to local node memory is faster than remote&lt;/li&gt;
&lt;li&gt;By default, Linux prioritizes allocating local memory on the CPU; the policy can be configured&lt;/li&gt;
&lt;li&gt;Each node has its own memory structure&lt;/li&gt;
&lt;li&gt;NUMA is not suitable for all scenarios; it requires adaptation by upper-layer applications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;NUMA balancing&lt;/em&gt;:
Achieves local access by automatically transferring tasks to remote CPUs or copying remote data to local memory. Enabled by default on Red Hat 7.&lt;/p&gt;
&lt;p&gt;Transferring tasks or copying data itself consumes resources and can slow down tasks. This feature may not be suitable for some applications; for example, Oracle&amp;rsquo;s Exadata has targeted NUMA optimizations.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;numactl&lt;/em&gt;:
NUMA OS configuration tool.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;numactl --show&lt;/code&gt; displays CPU and node information. Below is an example of 4 nodes with 64c 256g total, each node having 16c 64g:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;available: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; nodes &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0-3&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;33&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65418&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;310&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;47&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;53&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;57&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Zone
 &lt;div id="zone" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#zone" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;NUMA divides CPUs and memory into multiple nodes (node 0, node 1, node 2&amp;hellip;). In UMA structures, the CPU memory as a whole can be viewed as node 0.&lt;/p&gt;
&lt;p&gt;In Linux, each node is represented by the data structure &lt;code&gt;struct pglist_data&lt;/code&gt;, with the data type &lt;code&gt;typedef pg_data_t&lt;/code&gt;. Each node is further divided into multiple zones. A zone&amp;rsquo;s data structure is &lt;code&gt;zone_t&lt;/code&gt;, with the data type &lt;code&gt;zone_struct&lt;/code&gt;. There are generally 3 types: &lt;code&gt;ZONE_DMA&lt;/code&gt;, &lt;code&gt;ZONE_NORMAL&lt;/code&gt;, &lt;code&gt;ZONE_HIGHMEM&lt;/code&gt;, each with different functions.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8507ec262240.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.kernel.org/doc/gorman/html/understand/understand005.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/understand005.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Zone distribution and functions in 32-bit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ZONE_DMA&lt;/code&gt;: (&amp;lt;16MB), &lt;em&gt;Direct Memory Access&lt;/em&gt; (DMA), the ancient 16 MiB limit, includes &lt;a href="https://en.wikipedia.org/wiki/Industry_Standard_Architecture" target="_blank" rel="noreferrer"&gt;ISA devices&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_DMA32&lt;/code&gt;: Since many devices encounter problems accessing memory that cannot be addressed with 32 bits, this zone was added in x86-64. This zone only exists in x86-64 architecture. (See &lt;a href="https://lwn.net/Articles/152462/" target="_blank" rel="noreferrer"&gt;ZONE_DMA32&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_NORMAL&lt;/code&gt;: (16MB to 896MB), ordinary memory domain that can be directly mapped to the kernel segment; most kernel operations take place in the NORMAL zone, this is the most important zone&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_HIGHMEM&lt;/code&gt;: (&amp;gt;896MB), marks physical memory beyond the kernel segment, cannot be directly called by the kernel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Zone distribution diagram for 32-bit and 64-bit:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/13284b0811ff.png" alt="Insert image description" /&gt;
&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note that zones are for physical memory. Virtual memory must switch from user mode to kernel mode before it can call physical memory. The following diagram shows the relationship between kernel addresses in virtual memory space and zones in physical address space:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0f0adc123018.png" alt="Insert image description" /&gt;
(&lt;a href="https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2014_2015/kp-1415-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2014_2015/kp-1415-memory-management.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Inspect zones:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/zoneinfo&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/buddyinfo&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/pagetypeinfo&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2080&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1420&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;995&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;596&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;357&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;278&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;241&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;133&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;195748&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;204074&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;161167&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;119070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;70791&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;33578&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9556&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1034&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2533&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7328&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 1, zone Normal &lt;span style="color:#ae81ff"&gt;11705&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51467&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36752&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21326&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11343&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7309&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5024&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3403&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2597&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3056&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10898&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Pages
 &lt;div id="pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Virtual memory and physical memory are divided into fixed-size segments, typically 4KB in size. So after virtual memory is divided, we have virtual pages, and after physical memory is divided, we have physical pages (PP or PF, Physical Page or Page Frame), also called page frames, also 4KB. The page frame represents the minimum unit of system memory.&lt;/p&gt;
&lt;p&gt;Each page in the virtual address space can be mapped to a page frame in the physical address space through its descriptor.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Huge Pages / Transparent Huge Pages
 &lt;div id="huge-pages--transparent-huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#huge-pages--transparent-huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Pages are the minimum unit of memory allocation (default 4K). When mapping and allocating a large number of contiguous pages, performance is poor. &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge" target="_blank" rel="noreferrer"&gt;Huge Pages&lt;/a&gt; solve this problem. Huge pages are not only cheaper to allocate, but the page table is also relatively smaller. hugepagesz is 2 MB or 1 GB, defaulting to 2MB. Huge Pages were implemented starting from Red Hat 6.&lt;/p&gt;
&lt;p&gt;Since manually managing huge pages is cumbersome, Red Hat 6 also provided automatic huge page management, i.e., &lt;strong&gt;Transparent Huge Pages&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In Oracle database management, huge pages are generally enabled for SGA use, while transparent huge pages are disabled. There is plenty of related material available for searching.&lt;/p&gt;
&lt;p&gt;Similarly, PostgreSQL can also enable huge pages. Since databases generally occupy more operating system memory, enabling huge pages for databases can generally reduce memory allocation pressure.&lt;/p&gt;

&lt;h3 class="relative group"&gt;File Pages &amp;amp; Page Cache / Anonymous Pages &amp;amp; Swap Cache
 &lt;div id="file-pages--page-cache--anonymous-pages--swap-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#file-pages--page-cache--anonymous-pages--swap-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;File pages can be mapped to files on disk. File system reads and writes use Page Cache as buffered IO. Dirty data is synced (or fsynced, etc.) to the corresponding disk periodically or when called. Page Cache is the memory area used to &amp;ldquo;boost&amp;rdquo; disk performance.&lt;/p&gt;
&lt;p&gt;Correspondingly, pages without associated files are called Anonymous Pages, generally corresponding to heap and stack. When memory resources are tight, the kernel writes infrequently used anonymous page data to swap partitions or swap files.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Page cache corresponds to file mappings&lt;/li&gt;
&lt;li&gt;Swap cache corresponds to anonymous pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/cc6be7d9bb51.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The above page cache diagram is from the operating system&amp;rsquo;s perspective. Application (such as database) writes can also be non-delayed, or even bypass Page Cache.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Allocation
 &lt;div id="memory-allocation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-allocation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Memory allocation is also very complex, involving many concepts. Two common memory allocation methods are buddy and slab.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The buddy system is used for allocating contiguous memory pages. Each zone has its own buddy system. The buddy system divides large blocks of memory to respond to memory allocation requests, and due to its coalescing characteristics, it can reduce system memory fragmentation.&lt;/p&gt;
&lt;p&gt;The buddy allocator divides memory into pages of powers of 2, with the maximum order being 10:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/031369fb6ba0.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;When a memory request is larger than the existing block size, the system splits the larger block into two equally sized buddy blocks. When memory is freed, the system attempts to merge adjacent buddy blocks into a larger block:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2eb819ac1f2d.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;When freeing a page, the page is directly placed back into the free list. If the other half of the previously split page is also unallocated, they are combined into a double-sized page and given to the next larger list, and so on, until it can no longer be merged or has reached the top.&lt;/p&gt;
&lt;p&gt;When higher-order pages are depleted due to continuous allocation, fragmentation issues arise when requesting higher-order pages:



&lt;img src="https://lastdba.com/img/csdn/385e337b4093.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;After waiting for memory reclamation to succeed, buddy itself merges lower orders into higher orders, then allocates higher-order pages:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2452398bd6ce.png" alt="Insert image description" /&gt;
(The implementations of anti pages fragmentation in Linux kernel &lt;a href="https://teawater.github.io/presentation/antif.pdf" target="_blank" rel="noreferrer"&gt;https://teawater.github.io/presentation/antif.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;However, memory reclamation may also not keep up with allocation speed, so the buddy system is not always ideal.&lt;/p&gt;
&lt;p&gt;Analysis example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;272&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;The above contains 3 ZONEs: DMA, DMA32, Normal&lt;/li&gt;
&lt;li&gt;Orders: 0 ~ 10, i.e., the count of each order in buddy. The maximum order of buddy is 10, i.e., 1024 pages, which is 4MB&lt;/li&gt;
&lt;li&gt;For example, the 3rd column in the Normal row indicates there are 31620 blocks of 2^2 contiguous memory available&lt;/li&gt;
&lt;li&gt;By extension, the further back, the more contiguous the space. The larger the number, the more contiguous space of that size there is. When large contiguous spaces are scarce, it indicates significant memory fragmentation&lt;/li&gt;
&lt;li&gt;Additionally, summing everything up gives the current free memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Judging memory fragmentation issues through buddyinfo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#host 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#host 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;7321&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7833&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10885&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8514&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2311&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1644&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1302&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1141&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;80675&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The above shows the memory conditions of two hosts. Comparing them, the host below has more contiguous memory, while the host above has memory fragmentation issues.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Slab
 &lt;div id="slab" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slab" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The slab allocator manages memory &lt;strong&gt;based on objects&lt;/strong&gt;. The slab system is a memory allocation algorithm specifically designed for &lt;strong&gt;kernel&lt;/strong&gt; memory. It works by dividing memory into fixed-size caches, where each slab contains a set of objects of the same type. When there is a memory request, the algorithm first checks if available objects exist in the appropriate slab cache. If they exist, the object is returned. If not, the algorithm allocates a new slab and adds it to the appropriate cache.&lt;/p&gt;
&lt;p&gt;Objects of different sizes correspond to different slab caches:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6da00bf9a4b4.png" alt="Insert image description" /&gt;
(&lt;a href="https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf" target="_blank" rel="noreferrer"&gt;https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Although slab has different caches and objects, slab still uses physically contiguous memory:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8274eb96e6d6.png" alt="Insert image description" /&gt;
(&lt;a href="https://i.stack.imgur.com/wo8Gg.png" target="_blank" rel="noreferrer"&gt;https://i.stack.imgur.com/wo8Gg.png&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Slab also has 3 implementation methods:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c784e30b08b.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Reclamation
 &lt;div id="memory-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Recommended article: &lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;Linux Forced Memory Reclamation, Linux Memory Source Code Analysis - Memory Reclamation&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Reclamation Overview
 &lt;div id="memory-reclamation-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-reclamation-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;When system memory pressure is high, memory reclamation is performed on each zone under pressure. Memory reclamation mainly targets anonymous pages and file pages.&lt;/li&gt;
&lt;li&gt;For anonymous pages, during memory reclamation, some infrequently used anonymous pages are selected, written to the swap partition, and then released as free page frames to the buddy system.&lt;/li&gt;
&lt;li&gt;For file pages, during memory reclamation, some infrequently used file pages are also selected:
&lt;ul&gt;
&lt;li&gt;If the content saved in this file page is consistent with the corresponding file content on disk, this file page is a clean file page and does not need to be written back; it is directly released as a free page frame to the buddy system.&lt;/li&gt;
&lt;li&gt;If the data saved in the file page is inconsistent with the corresponding data in the file on disk, this file page is considered a dirty page. It must first be written back to the corresponding data location on disk, and then released as a free page frame to the buddy system.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;After memory reclamation completes, the number of free page frames in the system increases, alleviating memory pressure. However, the reclamation process puts significant IO pressure on the system. Therefore, a threshold is set for each zone in the system. When the number of free page frames falls below this threshold, memory reclamation operations are performed. When the number of free page frames meets this threshold, the system does not perform memory reclamation operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Zone Watermarks and kswapd
 &lt;div id="zone-watermarks-and-kswapd" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#zone-watermarks-and-kswapd" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c611d500255d.png" alt="Insert image description" /&gt;
(&lt;a href="https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/" target="_blank" rel="noreferrer"&gt;https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;When available memory is low, the kswapd daemon is awakened to free pages.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;pages_low&lt;/strong&gt;: When the number of available free pages falls below pages_low, the buddy allocator wakes up the &lt;strong&gt;kswapd&lt;/strong&gt; process, and the kernel begins swapping pages out to disk.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pages_min&lt;/strong&gt;: When the number of available pages reaches pages_min, the pressure of page reclamation work is relatively high because the memory zone urgently needs free pages. The allocator will execute kswapd work in a synchronous manner, sometimes called direct reclaim.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pages_high&lt;/strong&gt;: Once kswapd is awakened and begins freeing pages, the kernel considers the zone &amp;ldquo;balanced&amp;rdquo; only when the number of available pages reaches pages_high. If the watermark reaches pages_high, kswapd will re-enter the sleep state. If free pages exceed pages_high, the kernel considers the zone state ideal.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory reclamation is performed on a per-zone basis. &lt;code&gt;/proc/zoneinfo&lt;/code&gt; can display the values of min, low, and high.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; is the min_pages watermark, a very important OS parameter. Very low values prevent the system from effectively reclaiming memory, potentially leading to system crashes and service interruptions. Too high values increase system reclamation activity, causing allocation delays, which may lead the system to immediately enter an out-of-memory state.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Types of Memory Allocation and Reclamation
 &lt;div id="types-of-memory-allocation-and-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#types-of-memory-allocation-and-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Fast Memory Allocation&lt;/strong&gt;: Performed by the get_page_from_freelist() function, which obtains a suitable zone from the zonelist using the low threshold for allocation. If the zone has not reached the low threshold, fast memory reclamation is performed, and allocation is retried after fast memory reclamation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Slow Memory Allocation&lt;/strong&gt;: When fast allocation fails, meaning no zone in the zonelist obtained memory in fast allocation, the min threshold is used for slow allocation. During slow allocation, three main things happen: asynchronous memory compaction, direct memory reclamation, and light synchronous memory compaction. Finally, OOM allocation may occur depending on the situation. And after each of these operations, fast memory allocation is called once to attempt to obtain page frames.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/85d59cc1ec86.png" alt="Insert image description" /&gt;
(&lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_35094083/article/details/116688112&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Different memory allocation paths trigger different memory reclamation methods. Zone memory reclamation is divided into two types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Background Memory Reclamation&lt;/strong&gt; (kswapd): When physical memory is tight, the kswapd kernel thread is awakened to reclaim memory. This memory reclamation process is &lt;strong&gt;asynchronous&lt;/strong&gt; and does not block process execution.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Direct Memory Reclamation&lt;/strong&gt; (direct reclaim): If background asynchronous reclamation cannot keep up with process memory application speed, direct reclamation begins. This memory reclamation process is &lt;strong&gt;synchronous&lt;/strong&gt; and blocks process execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Memory Compaction
 &lt;div id="memory-compaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-compaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Memory compaction: see Memory Monitoring - /proc/pagetypeinfo section&lt;/p&gt;

&lt;h3 class="relative group"&gt;LRU
 &lt;div id="lru" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lru" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For zone memory reclamation, it targets three things for reclamation: slab, pages in LRU lists, and buffer_head. Here we only discuss memory reclamation targeting LRU lists.&lt;/p&gt;
&lt;p&gt;The main purpose of LRU lists is to sort pages, placing pages most deserving of reclamation at the back and pages least deserving of reclamation at the front. Then, during memory reclamation, scanning proceeds from back to front, attempting to reclaim scanned pages.&lt;/p&gt;
&lt;p&gt;LRU list descriptor, containing 5 LRU lists: active/inactive anonymous page LRU lists, active/inactive file page LRU lists, and unevictable page list:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/958bc46a109a.png" alt="Insert image description" /&gt;
(&lt;a href="https://lpc.events/event/11/contributions/896/attachments/793/1493/slides-r2.pdf" target="_blank" rel="noreferrer"&gt;https://lpc.events/event/11/contributions/896/attachments/793/1493/slides-r2.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;For memory reclamation, it only processes the first 4 LRU lists: active anonymous page LRU list, inactive anonymous page LRU list, active file page LRU list, and inactive file page LRU list. After reclaiming enough page frames, it returns directly: fast memory reclamation and kswapd memory reclamation do this.&lt;/p&gt;
&lt;p&gt;Global lruvec can be viewed through meminfo (understood as LRU areas):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cat /proc/meminfo |grep -i active&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active: &lt;span style="color:#ae81ff"&gt;597380&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive: &lt;span style="color:#ae81ff"&gt;601920&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;10896&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;117376&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;586484&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;484544&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In reality, there is more than one lruvec. cgroup and NUMA nodes each have their own lruvec, and global also has its own lruvec.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d25a7970acd0.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Drop Cache
 &lt;div id="drop-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#drop-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Drop cache records which pages are caching file system data pages and writes data back to disk when pages are forcibly reclaimed, so they can be cached again on the next access.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Default value: &lt;code&gt;vm.drop_caches = 0&lt;/code&gt;. By default, the Linux kernel does not automatically clear caches.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 1: The kernel clears unused page cache.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 2: The kernel releases memory used by dentry and inode. Dentry and inode are file system metadata structures used to store file and directory information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 3: Equivalent to 1+2, releases all unused caches.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When the kernel decides to reclaim certain caches, it checks whether the data in the cache is consistent with the data on disk. If the data is inconsistent, the kernel needs to write the data back to disk before reclaiming that cache. This process can cause IO spikes. When performing Drop Cache operations, it is recommended to avoid any important I/O operations as this may affect system performance.&lt;/p&gt;
&lt;p&gt;Operation commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo 3 &amp;gt; /proc/sys/vm/drop_caches # Flush cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo 0 &amp;gt; /proc/sys/vm/drop_caches # Restore default&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Memory Monitoring
 &lt;div id="memory-monitoring" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-monitoring" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Without understanding basic memory knowledge, it is actually very difficult to interpret memory monitoring information. With the above memory fundamentals in place, let&amp;rsquo;s go through memory-related monitoring commands and tools one by one.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What&amp;rsquo;s in the /proc Directory?
 &lt;div id="whats-in-the-proc-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-in-the-proc-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;/proc mainly contains process information and system information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/539736f743ba.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;In the system information part, some are interfaces provided by Linux for system status, allowing you to view monitoring information at the entire operating system level, such as slabinfo, swaps, zoneinfo, buddyinfo.&lt;/p&gt;
&lt;p&gt;The other part, process, contains running data and status information for each process. cd into the corresponding process directory to see the FDs held by the corresponding process and process memory information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e5f7e542f245.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/de0fd3de265d.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Processes also have threads. Thread information directory: /proc/[pid]/task/[tid]/, with content similar to the process directory.&lt;/p&gt;
&lt;p&gt;For more proc information, refer to &lt;a href="https://man7.org/linux/man-pages/man5/proc.5.html" target="_blank" rel="noreferrer"&gt;proc(5) — Linux manual page&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;/proc/meminfo
 &lt;div id="procmeminfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procmeminfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;/proc/meminfo is the primary interface for understanding the current Linux system memory usage. The most commonly used commands like &lt;code&gt;free&lt;/code&gt;, &lt;code&gt;vmstat&lt;/code&gt;, &lt;code&gt;ps&lt;/code&gt; obtain data through it. /proc/meminfo information is more comprehensive. Below we only list some common information. For detailed meanings, refer to the &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo" target="_blank" rel="noreferrer"&gt;Red Hat documentation&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# General memory information&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Mem&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemTotal: &lt;span style="color:#ae81ff"&gt;994328&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total memory size (minus some reserved and kernel)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemFree: &lt;span style="color:#ae81ff"&gt;66428&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Completely unused physical memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemAvailable: &lt;span style="color:#ae81ff"&gt;207192&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Maximum available memory for starting a new application without using swap space&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# IO buffers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers&amp;#34;&lt;/span&gt; -we &lt;span style="color:#e6db74"&gt;&amp;#34;Cached&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Buffers: &lt;span style="color:#ae81ff"&gt;12820&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# IO buffers used by raw disk blocks, not exceeding 20MB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Cached: &lt;span style="color:#ae81ff"&gt;254592&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Page cache size used by disks (includes tmpfs and shmem, excludes SwapCached)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# swap&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Swap&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapCached: &lt;span style="color:#ae81ff"&gt;13936&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Swap cache contains anonymous memory pages determined to be swapped but not yet written to physical swap area&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapTotal: &lt;span style="color:#ae81ff"&gt;945416&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total swap space size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapFree: &lt;span style="color:#ae81ff"&gt;851064&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Remaining swap size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# lru active and inactive page counts (self-explanatory)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Active&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Inactive&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active: &lt;span style="color:#ae81ff"&gt;194308&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive: &lt;span style="color:#ae81ff"&gt;553172&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;59024&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;437264&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;135284&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;115908&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Dirty pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Dirty&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Writeback&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Dirty pages not yet written&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Writeback: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Dirty pages being written&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WritebackTmp: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Temporary buffer for writebacks used by the FUSE module&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Map information&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;AnonPages&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Map&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonPages: &lt;span style="color:#ae81ff"&gt;95296&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped anonymous pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mapped: &lt;span style="color:#ae81ff"&gt;153192&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped file pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap4k: &lt;span style="color:#ae81ff"&gt;113336&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 4k kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap2M: &lt;span style="color:#ae81ff"&gt;1900544&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 2M kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap1G: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 1G kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Shmem&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shmem: &lt;span style="color:#ae81ff"&gt;28920&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total memory size of shmem and tmpfs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total huge page memory size of shmem and tmpfs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemPmdMapped: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Shared memory mapped into userspace with huge pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Kernel memory (note: slab is kernel)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -ie &lt;span style="color:#e6db74"&gt;&amp;#34;reclaim&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;slab&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;kernel&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KReclaimable: &lt;span style="color:#ae81ff"&gt;35008&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Reclaimable memory allocated to kernel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Slab: &lt;span style="color:#ae81ff"&gt;88752&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SReclaimable: &lt;span style="color:#ae81ff"&gt;35008&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Reclaimable memory in slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SUnreclaim: &lt;span style="color:#ae81ff"&gt;53744&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Non-reclaimable memory in slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelStack: &lt;span style="color:#ae81ff"&gt;5988&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Kernel stack memory used by all tasks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Allocatable memory (different meaning from MemAvailable)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## CommitLimit=[(&amp;#34;total RAM pages&amp;#34; - &amp;#34;total huge TLB pages&amp;#34;) * overcommit_ratio]/100 + &amp;#34;total swap pages&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## In short, MemAvailable watermark plus swap equals allocatable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -ie &lt;span style="color:#e6db74"&gt;&amp;#34;commit&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommitLimit: &lt;span style="color:#ae81ff"&gt;1442580&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Allocatable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Committed_AS: &lt;span style="color:#ae81ff"&gt;3035924&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Estimated memory needed in current worst-case scenario&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Virtual memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Vmalloc&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocTotal: &lt;span style="color:#ae81ff"&gt;34359738367&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total allocated virtual memory size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocUsed: &lt;span style="color:#ae81ff"&gt;34780&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total used virtual memory size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocChunk: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Largest contiguous virtual memory block&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Page table memory (self-explanatory)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep PageTables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PageTables: &lt;span style="color:#ae81ff"&gt;4120&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Huge page memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -i hugepage
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;32768&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FileHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Total: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Free: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Rsvd: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Surp: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Hugepagesize: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;/proc/buddyinfo
 &lt;div id="procbuddyinfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procbuddyinfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to its concise and easy-to-understand information, buddyinfo is the most commonly used method for judging memory fragmentation issues. See &amp;ldquo;Memory Allocation - Buddy section&amp;rdquo; for details.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;272&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;/proc/pagetypeinfo
 &lt;div id="procpagetypeinfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procpagetypeinfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pagetypeinfo first provides information about page block sizes. It provides the same type of information as buddyinfo but broken down by type and detailing the number of pages of each type.&lt;/p&gt;
&lt;p&gt;Before understanding pagetypeinfo, you need to first understand &lt;a href="https://lwn.net/Articles/368869/" target="_blank" rel="noreferrer"&gt;memory compaction&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Suppose the memory in a zone looks like this:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/97866485c91f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;White represents free memory, red represents used memory. The memory fragmentation above is already quite severe. If a request for memory of order 2 or higher is made at this point, it cannot be allocated. This is where memory compaction comes into play. The compaction algorithm marks movable pages and free pages lists on the existing zone.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/706746415abf.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;The movable scanner scans from bottom to top, and the free scanner scans from top to bottom. The movable and free scanners will eventually meet at some point in the middle. Then, through &lt;a href="https://lwn.net/Articles/157066/" target="_blank" rel="noreferrer"&gt;page migration&lt;/a&gt;, used pages are moved to the top of the zone.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b308995434f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Two trigger methods for page compaction&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When allocating pages, if allocation fails at the LOW watermark, slow memory allocation is attempted, during which page compaction occurs&lt;/li&gt;
&lt;li&gt;Page compaction can be started with &lt;code&gt;echo x &amp;gt; /proc/sys/vm/compact_memory&lt;/code&gt;. After starting, the kernel thread &lt;code&gt;kcompactd&lt;/code&gt; begins page defragmentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because page data is migrated to new locations, there are no performance issues as severe as those caused by memory reclamation. Moreover, since the goal is clearer, the cost of obtaining contiguous pages is lower. Additionally, ANON page reclamation requires SWAP, while this does not.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s look at /proc/pagetypeinfo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/pagetypeinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Page block order: &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pages per block: &lt;span style="color:#ae81ff"&gt;512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;... &lt;span style="color:#f92672"&gt;(&lt;/span&gt;DMA omitted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Unmovable &lt;span style="color:#ae81ff"&gt;870&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;530&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;391&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Movable &lt;span style="color:#ae81ff"&gt;5886&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9235&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5728&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4072&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1561&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;324&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;115&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13018&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Reclaimable &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type HighAtomic &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type CMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Isolate &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Different pages are classified as pageblocks. Each pageblock is divided into several lists based on its type. When allocating memory, pages are requested from the corresponding list based on the requested page type, and when freed, they return to the corresponding list based on their pageblock. Different pageblocks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unmovable: Pages that cannot be compacted&lt;/li&gt;
&lt;li&gt;Movable: Pages that can be compacted&lt;/li&gt;
&lt;li&gt;Reclaimable: Pages that can be reclaimed&lt;/li&gt;
&lt;li&gt;HighAtomic: Pageblock added to mitigate fragmentation issues. Only higher-order and same-level requests can request pages from this pageblock&lt;/li&gt;
&lt;li&gt;CMA: CMA stands for Contiguous Memory Allocator&lt;/li&gt;
&lt;li&gt;Isolate: Pages will not be allocated; used to help isolate pages. When isolating pages, pageblocks are first set to isolate to prevent them from being freed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CMA appears to be another large topic, which can be simply understood as a supplement to the buddy system:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a9cd9dcbfe80.png" alt="Insert image description" /&gt;
(Memory Journey — How to Improve CMA Utilization? &lt;a href="https://ost.51cto.com/posts/10815" target="_blank" rel="noreferrer"&gt;https://ost.51cto.com/posts/10815&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;smaps &amp;amp; maps &amp;amp; pmap
 &lt;div id="smaps--maps--pmap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smaps--maps--pmap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;VSS/RSS/PSS/USS&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When viewing the memory occupied by a process, there are commonly four forms: VSS/RSS/PSS/USS, mainly differing in memory calculation methodology.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/21b9b29f53a3.png" alt="Insert image description" /&gt;
(&lt;a href="https://cloud.tencent.com/developer/article/1683708" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/1683708&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VSS (Virtual Set Size) is just a virtual space size, with little significance for actual memory usage.&lt;/li&gt;
&lt;li&gt;RSS (Resident Set Size) is used for calculating the total memory occupied by a process, including shared memory size occupied by shared libraries. For example, if private memory size is N and shared memory size is M, then RSS = N + M. This can be misleading, because for large shared libraries like libc, shared by many processes, counting it all against one process is not scientific.&lt;/li&gt;
&lt;li&gt;PSS (Proportional Set Size) is the actual physical memory occupied by a single process when running, including proportionally allocated shared library memory. If a shared library is used by N processes, the size proportionally allocated to PSS is 1/N. PSS calculates process memory more accurately, including exclusive memory plus the shared portion.&lt;/li&gt;
&lt;li&gt;USS (Unique Set Size) is the physical memory exclusively occupied by a process, not including shared memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;/proc/[pid]/maps&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;/proc/[pid]/maps can view the &lt;strong&gt;user space&lt;/strong&gt; memory mappings of the &lt;strong&gt;process&amp;rsquo;s&lt;/strong&gt; &lt;strong&gt;virtual memory&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat maps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;StartAddr-EndAddr Perms Offset Dev Inode Filename
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-00bae000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres ---text segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00dad000-00dc3000 rw-p 007ad000 fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00dc3000-00df5000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00f1e000-00f60000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt; ---heap area
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;33a6000000-33a6022000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1976006&lt;/span&gt; /lib64/ld-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae09000-7fbe2ae0a000 rw-p 0000c000 fd:00 &lt;span style="color:#ae81ff"&gt;1975966&lt;/span&gt; /lib64/libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae1b000-7fbe33ca7000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;12556&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe33ca7000-7fbe39b38000 r--p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1181300&lt;/span&gt; /usr/lib/locale/locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b38000-7fbe39b3d000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b46000-7fbe39b4d000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:10 &lt;span style="color:#ae81ff"&gt;12559&lt;/span&gt; /dev/shm/PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b4d000-7fbe39b4e000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; /SYSV0010c0b6 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b4e000-7fbe39b4f000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe3933000-7fffe3948000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt; --stack area
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe397d000-7fffe397e000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;vdso&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000-ffffffffff601000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;vsyscall&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(1) Start-End Address: The address range of this segment in virtual memory
(2) Permissions: Permissions of this segment; r-read, w-write, x-execute, p-private
(3) Offset: The offset of this segment mapping in the file
(4) Device: The device number of the device where the mapped file resides, corresponding to vm_file-&amp;gt;f_dentry-&amp;gt;d_inode-&amp;gt;i_sb-&amp;gt;s_dev. &lt;strong&gt;Anonymous mappings have 0. fd is the major device number, 00 is the minor device number.&lt;/strong&gt;
(5) Inode: Corresponds to vm_file-&amp;gt;f_dentry-&amp;gt;d_inode-&amp;gt;i_ino, &lt;strong&gt;matches the content displayed by ls -i, anonymous mappings have 0.&lt;/strong&gt;
(6) Mapped File Name: For named mappings, it&amp;rsquo;s the mapped file name. For anonymous mappings, it&amp;rsquo;s the role of this memory segment in the process.&lt;/p&gt;
&lt;p&gt;Below is an analysis by Wenxin (it actually analyzed it correctly, this is a PostgreSQL postmaster process):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/639e6e130d4f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;/proc/[pid]/smaps&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The /proc/[pid]/smaps file is an extension based on /proc/[pid]/maps, providing more detailed information than the maps file in the same directory. Each VMA has the following series of data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat smaps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-00bae000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; kB --VSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;408&lt;/span&gt; kB --RSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;140&lt;/span&gt; kB --PSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;404&lt;/span&gt; kB --Shared, clean memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Shared, dirty &lt;span style="color:#f92672"&gt;(&lt;/span&gt;i.e., modified&lt;span style="color:#f92672"&gt;)&lt;/span&gt; memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Private, clean memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Private, dirty memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;408&lt;/span&gt; kB --Current page marked as referenced or containing anonymous mappings
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Anonymous pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Anonymous huge pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Swapped-out memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Kernel page size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Page table page size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe3933000-7fffe3948000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now we know that maps are the process&amp;rsquo;s memory mapping information, and smaps also includes the memory size of each mapping segment (VSS, RSS, PSS).&lt;/p&gt;
&lt;p&gt;You can calculate a process&amp;rsquo;s memory usage by looking at PSS, RSS, etc. data in process smaps. Note the unit is KB.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Total physical memory usage of all processes&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep Pss /proc/&lt;span style="color:#f92672"&gt;[&lt;/span&gt;1-9&lt;span style="color:#f92672"&gt;]&lt;/span&gt;*/smaps | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{total+=$2}; END {printf &amp;#34;%d kB\n&amp;#34;, total }&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;PSS memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;RSS memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Private memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;pmap&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The pmap command parses the /proc/[pid]/maps and /proc/[pid]/smaps files. It has few parameters; -x means show more information.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# pmap -x 2345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2345: /pg/pg15.3/bin/postgres -D /pg/1503data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;212&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dad000 &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dc3000 &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000f1e000 &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00000033a6000000 &lt;span style="color:#ae81ff"&gt;136&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- ld-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae09000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae1b000 &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4396&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4396&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe33ca7000 &lt;span style="color:#ae81ff"&gt;96836&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r---- locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b38000 &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b46000 &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw-s- PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x8001 &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4e000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe3933000 &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe397d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------- ------ ------ ------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total kB &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5532&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4540&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The pmap output format is similar to /proc/[pid]/maps, with one line per VMA address, but includes VSS and RSS in addition to maps, allowing you to directly see the size used by each region of the process&amp;rsquo;s virtual memory, helping to quickly determine where the regions with more memory are.&lt;/p&gt;
&lt;p&gt;If the [heap] in the address space is too large, it might be a heap memory leak. For another example, if the process address space contains too many VMAs (each line in maps can be understood as a VMA), it&amp;rsquo;s likely that the application called many mmaps without munmap. Or, continuously observing changes in the address space — if certain entries are continuously growing, there&amp;rsquo;s likely an issue there.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Analysis Example&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the host&amp;rsquo;s TOP memory view, a certain PostgreSQL backend process memory appears relatively high. Further analysis of map information is needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;68729&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5579004&lt;/span&gt; 5.116g 5.114g R 97.4 1.4 128:27.94 postgres: lzl: lzldb lzl 30.78.14.174&lt;span style="color:#f92672"&gt;(&lt;/span&gt;58067&lt;span style="color:#f92672"&gt;)&lt;/span&gt; DELETE &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Check this process&amp;rsquo;s Rss, Pss, Uss:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5422.67 ---5.4G Rss
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;467.957 ---467mb Pss
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;179.605 ---179mb Uss&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Rss-Uss=5.3G of shared memory. From Pss-Uss=290mb of proportional shared memory, we can roughly see that this backend is only a small portion of this shared memory proportion.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pmap -x &lt;span style="color:#ae81ff"&gt;68729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;68729: postgres: pdmp: pdmpdata pdmp 30.78.14.174&lt;span style="color:#f92672"&gt;(&lt;/span&gt;46252&lt;span style="color:#f92672"&gt;)&lt;/span&gt; DELETE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2444&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000bf0000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000bf1000 &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b7f65bfa000 &lt;span style="color:#ae81ff"&gt;5441216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; --this part takes the most
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1daa000 &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1db6000 &lt;span style="color:#ae81ff"&gt;2044&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; ----- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb5000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb6000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw--- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb7000 &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80ba001000 &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe16f7000 &lt;span style="color:#ae81ff"&gt;132&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe175b000 &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Diving deeper into smap analysis, we can directly locate the zero (deleted) part:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat smaps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-009f1000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:06 &lt;span style="color:#ae81ff"&gt;58726481&lt;/span&gt; /paic/postgres/base/9.6.6/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b7f65bfa000-2b80b1daa000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;72254&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;5441216&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;264618&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; kB --shared dirty data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;5364764&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Locked: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmFlags: rd wr sh mr mw me ms sd &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the above analysis, we can conclude: this is a PostgreSQL private process that has modified a large amount of data without flushing dirty pages. Its own private memory is not much; most is occupied in shared memory. This is likely a transaction in PostgreSQL that has modified a lot of data but hasn&amp;rsquo;t committed yet.&lt;/p&gt;
&lt;p&gt;Additionally, /dev/zero (deleted) is explained in &lt;a href="https://www.man7.org/linux/man-pages/man5/proc.5.html" target="_blank" rel="noreferrer"&gt;proc(5) — Linux manual page&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Although these entries are present for memory regions that were mapped with the MAP_FILE flag, the way anonymous shared memory (regions created with the MAP_ANON | MAP_SHARED flags) is implemented in Linux means that such regions also appear on this directory. Here is an example where the target file is the deleted /dev/zero one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; lrw-------. 1 root root 64 Apr 16 21:33
 7fc075d2f000-7fc075e6f000 -&amp;gt; /dev/zero (deleted)
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;&lt;p&gt;&amp;ldquo;Unofficial translation&amp;rdquo;: Anonymous pages and shared pages are represented by /dev/zero (deleted).&lt;/p&gt;

&lt;h3 class="relative group"&gt;/proc/[pid]/status
 &lt;div id="procpidstatus" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procpidstatus" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;status can view process state information, including some memory information.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# cat status &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Name: postgres ---the command running this thread
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;State: S &lt;span style="color:#f92672"&gt;(&lt;/span&gt;sleeping&lt;span style="color:#f92672"&gt;)&lt;/span&gt; ---process state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Tgid: &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; ---Thread group ID &lt;span style="color:#f92672"&gt;(&lt;/span&gt;i.e., Process ID&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pid: &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; ---Thread ID
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PPid: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ---PID of parent process.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmPeak: &lt;span style="color:#ae81ff"&gt;268964&lt;/span&gt; kB ---virtual memory peak
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmSize: &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; kB ---virtual memory current
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmLck: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmHWM: &lt;span style="color:#ae81ff"&gt;13400&lt;/span&gt; kB ---RSS peak
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmRSS: &lt;span style="color:#ae81ff"&gt;5532&lt;/span&gt; kB ---RSS current
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmData: &lt;span style="color:#ae81ff"&gt;528&lt;/span&gt; kB ---data segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmStk: &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; kB ---stack segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmExe: &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; kB ---text segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmLib: &lt;span style="color:#ae81ff"&gt;3100&lt;/span&gt; kB ---shared library code segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmPTE: &lt;span style="color:#ae81ff"&gt;136&lt;/span&gt; kB ---Page table entries
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmSwap: &lt;span style="color:#ae81ff"&gt;308&lt;/span&gt; kB ---swap size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Threads: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ---number of threads in this process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;....&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to maps, status has no mapping information. The memory data is more summarized, allowing for a more intuitive view of the size occupied by each segment of virtual memory.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;View processes with the most SWAP usage&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; file in /proc/*/status ; &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; awk &lt;span style="color:#e6db74"&gt;&amp;#39;/VmSwap|Name|^Pid/{printf $2 &amp;#34; &amp;#34; $3}END{ print &amp;#34;&amp;#34;}&amp;#39;&lt;/span&gt; $file; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt; | sort -k &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; -n -r | head&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;cgroup memory
 &lt;div id="cgroup-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt" target="_blank" rel="noreferrer"&gt;cgroup memory control&lt;/a&gt; is now very common. Some host parameters need to be set in cgroup. Memory settings and monitoring information are under /sys/fs/cgroup/memory/.&lt;/p&gt;
&lt;p&gt;cginfo to view CGROUP memory allocation and usage: /opt/cgtools/cginfo -t perf -s mem&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cginfo -t perf -s mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;====================&lt;/span&gt; Cgroup Performance: memory &lt;span style="color:#f92672"&gt;====================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DB_TYPE INSTANCE_NAME MEM_OOM MEM_FILE_GB MEM_MAP_GB MEM_USED_GB MEM_ALLO_GB ALLO_RATE MEM_GLOB_GB GLOB_RATE 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------- ------------- ------- ----------- ---------- ----------- ----------- --------- ----------- --------- 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres LZLDB &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 154.3 0.0 4.2 160.0 2.6% &lt;span style="color:#ae81ff"&gt;375&lt;/span&gt; 1.1% &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View relatively detailed CGROUP memory usage status: /sys/fs/cgroup/memory/[group]/memory.stat&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat memory.stat 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_cache &lt;span style="color:#ae81ff"&gt;167791534080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_rss &lt;span style="color:#ae81ff"&gt;4006932480&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_rss_huge &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mapped_file &lt;span style="color:#ae81ff"&gt;11747328&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_swap &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgpgin &lt;span style="color:#ae81ff"&gt;792754417976&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgpgout &lt;span style="color:#ae81ff"&gt;792712474991&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgfault &lt;span style="color:#ae81ff"&gt;477971874868&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgmajfault &lt;span style="color:#ae81ff"&gt;97318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_inactive_anon &lt;span style="color:#ae81ff"&gt;1610874880&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_active_anon &lt;span style="color:#ae81ff"&gt;2408255488&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_inactive_file &lt;span style="color:#ae81ff"&gt;73446166528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_active_file &lt;span style="color:#ae81ff"&gt;94332768256&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_unevictable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;smem
 &lt;div id="smem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://linux.die.net/man/8/smem" target="_blank" rel="noreferrer"&gt;smem&lt;/a&gt; is a powerful tool for displaying memory usage. It reads information from smaps, meminfo, etc. under /proc and outputs summaries. smem can output overall and specific map memory conditions, which is very intuitive and can be analyzed from different dimensions. Overall, it&amp;rsquo;s a very useful tool for analyzing memory usage.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://selenic.com/repo/smem" target="_blank" rel="noreferrer"&gt;repo&lt;/a&gt; can be downloaded directly. Basically, just extract and use it. For more usage, refer to &lt;a href="https://www.selenic.com/smem/" target="_blank" rel="noreferrer"&gt;smem memory reporting tool&lt;/a&gt;. Below are just simple examples:&lt;/p&gt;
&lt;p&gt;View system memory usage &lt;code&gt;-w&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -w -k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Area Used Cache Noncache 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;firmware/hardware &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kernel image &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kernel dynamic memory 183.9M 84.0M 99.9M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;userspace memory 112.3M 62.2M 50.1M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free memory 700.3M 700.3M &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory consumption per user &lt;code&gt;-u&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -s pss -urk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User Count Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; 85.2M 30.8M 95.7M 383.0M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;root &lt;span style="color:#ae81ff"&gt;93&lt;/span&gt; 112.4M 38.5M 42.3M 86.2M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; 5.9M 1.6M 2.5M 5.9M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mysql &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; 169.7M 1.7M 1.7M 2.0M &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory consumption for a specific user &lt;code&gt;-U&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -U pg -k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID User Command Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; pg /pg/pg15.3/bin/postgres -D 364.0K 124.0K 134.0K 228.0K 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2352&lt;/span&gt; pg postgres: logical replicati 636.0K 144.0K 161.0K 196.0K 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Filter a specific process &lt;code&gt;-P&lt;/code&gt; (PROCESSFILTER, not pid):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@lzl ~]# smem -P postgres -p
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID User Command Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2346 pg /pg/pg16.0/bin/postgres -D 0.01% 0.01% 0.01% 0.01% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2350 pg postgres: walwriter 0.01% 0.01% 0.01% 0.01% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View process mapping and memory usage &lt;code&gt;-m&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -P postgres -mpr -s pss&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Map PIDs AVGPSS PSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;anonymous&amp;gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; 0.02% 0.24% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 0.07% 0.20% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/usr/lib64/libpython2.6.so.1.0 &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; 0.11% 0.11% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg15.3/bin/postgres &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 0.01% 0.06% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg16.0/bin/postgres &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 0.01% 0.06% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/dev/zero &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; 0.00% 0.03% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; 0.00% 0.02% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;smem is very intuitive for viewing process USS\PSS\RSS. However, there is one issue: smem cannot filter by pid, only by username or PROCESSFILTER. When a host has multiple database instances deployed, filtering by parent PID or child PID is not very friendly.&lt;/p&gt;

&lt;h3 class="relative group"&gt;top
 &lt;div id="top" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#top" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/top.1.html" target="_blank" rel="noreferrer"&gt;top&lt;/a&gt; can display system running status in real time. top can be quite fancy in its usage. Running top directly can also display a lot of information.&lt;/p&gt;
&lt;p&gt;Sorting in top:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;command sorted-field supported
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M %MEM Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;N PID Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;P %CPU Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;T TIME+ Yes&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can use %MEM to sort processes with higher memory usage. %MEM represents the RES memory percentage.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;top - 23:38:01 up &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; days, 22:32, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; users, load average: 1.12, 1.42, 1.09
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Tasks: &lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; total, &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; running, &lt;span style="color:#ae81ff"&gt;183&lt;/span&gt; sleeping, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; stopped, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; zombie
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Cpu&lt;span style="color:#f92672"&gt;(&lt;/span&gt;s&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: 1020348k total, 325848k used, 694500k free, 1352k buffers
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: 4128760k total, 635872k used, 3492888k free, 150288k cached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18537&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 636m 24m 21m S 0.0 2.4 0:05.41 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18533&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 638m 24m 21m S 0.0 2.4 0:02.01 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18509&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 634m &lt;span style="color:#ae81ff"&gt;4384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4036&lt;/span&gt; S 0.0 0.4 0:01.93 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2639&lt;/span&gt; root &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 729m &lt;span style="color:#ae81ff"&gt;4052&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1444&lt;/span&gt; S 0.0 0.4 8:45.32 nautilus &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Memory-related interpretation:&lt;/p&gt;
&lt;p&gt;Line 4: Memory usage information: physical memory amount, used memory, free memory, buffer memory
Line 5: Swap partition information: available swap total, used swap total, free swap total, kernel cached amount&lt;/p&gt;
&lt;p&gt;Line 6 (memory-related):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VIRT: VSS&lt;/li&gt;
&lt;li&gt;RES: RSS (likely), anything occupying physical memory&lt;/li&gt;
&lt;li&gt;SHR: Shared Memory Size. It will include shared anonymous pages and shared file-backed pages&lt;/li&gt;
&lt;li&gt;%MEM: RSS percentage, a task&amp;rsquo;s currently resident share of available physical memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, don&amp;rsquo;t forget to look at the process status when checking memory.&lt;/p&gt;
&lt;p&gt;S (example column 8) Process Status:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;D = uninterruptible sleep. Indicates the process is waiting for an external event to complete, such as disk I/O operations or network requests. Usually, D processes cannot be directly terminated.&lt;/li&gt;
&lt;li&gt;I = idle&lt;/li&gt;
&lt;li&gt;R = running&lt;/li&gt;
&lt;li&gt;S = sleeping&lt;/li&gt;
&lt;li&gt;T = stopped by job control signal&lt;/li&gt;
&lt;li&gt;t = stopped by debugger during trace&lt;/li&gt;
&lt;li&gt;Z = zombie&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The top command can see the host&amp;rsquo;s memory summary information. Process memory usage information includes RSS and SHR. A rough calculation of RES-SHR=USS can also calculate the private memory usage size. Additionally, you can see process status, so top -p to view basic memory information for a specific process is very useful.&lt;/p&gt;

&lt;h3 class="relative group"&gt;free
 &lt;div id="free" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#free" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/free.1.html" target="_blank" rel="noreferrer"&gt;free&lt;/a&gt; displays the host&amp;rsquo;s swap, total and remaining memory, all parsed from /proc/meminfo.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;user@ubuntu:~$ free
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total used free shared buff/cache available
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: 8029356 794336 6297928 183384 937092 6816804
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: 0 0 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;total: Total usable memory (MemTotal and SwapTotal in /proc/meminfo). This includes the physical and swap memory minus a few reserved bits and kernel binary code.&lt;/li&gt;
&lt;li&gt;used: Used or unavailable memory (calculated as total - available)&lt;/li&gt;
&lt;li&gt;free: Unused memory (MemFree and SwapFree in /proc/meminfo) shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)&lt;/li&gt;
&lt;li&gt;buffers: Memory used by kernel buffers (Buffers in /proc/meminfo)&lt;/li&gt;
&lt;li&gt;cache: Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo). Not just pagecache, but also SReclaimable slab!&lt;/li&gt;
&lt;li&gt;buff/cache: Sum of buffers and cache&lt;/li&gt;
&lt;li&gt;available: cache includes pagecache and SReclaimable, free includes mem free and swap free; while available includes pagecache and memory about to be reclaimed. Indicates available memory, but their calculation methods differ. In practical applications, due to cache existence, available is usually larger than free.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Page Cache:
Page cache is primarily used as a cache for file data on the file system, especially when processes have read/write operations on files.&lt;/p&gt;
&lt;p&gt;Buffer Cache:
Buffer cache is primarily designed for caching blocks when the system reads/writes block devices.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ps aux
 &lt;div id="ps-aux" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ps-aux" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The biggest advantage of ps is analyzing process status (including memory) from the process perspective. Processes with [ ] flags in the COMMAND are kernel processes.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg@lzl ~]$ ps aux|head -1;ps aux|grep postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2345 0.0 0.0 268896 236 ? Ss Jan01 0:03 /pg/pg15.3/bin/postgres -D /pg/1503data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2353 0.0 0.0 269040 196 ? Ss Jan01 0:00 postgres: checkpointer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2354 0.0 0.0 269032 160 ? Ss Jan01 0:02 postgres: background writer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2356 0.0 0.0 269032 116 ? Ss Jan01 0:01 postgres: walwriter
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2357 0.0 0.0 270508 824 ? Ss Jan01 0:02 postgres: autovacuum launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2358 0.0 0.0 270492 620 ? Ss Jan01 0:00 postgres: logical replication launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 29818 0.0 0.0 103372 868 pts/0 S+ 09:16 0:00 grep postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;VSZ and RSS units are KB. Memory information is limited; VSZ has little value, RSS can be referenced, but there&amp;rsquo;s no PSS or USS type information, so not much can be analyzed.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ipcs
 &lt;div id="ipcs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ipcs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ipcs -m&lt;/code&gt; is a command for querying IPC (Interprocess Communication) shared memory resources. It&amp;rsquo;s quite useful when analyzing shared memory.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ipcs -m
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------ Shared Memory Segments --------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;key shmid owner perms bytes nattch status 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0x0010c0b6 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; pg &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Shared memory key value&lt;/li&gt;
&lt;li&gt;Shared memory ID (shmid)&lt;/li&gt;
&lt;li&gt;User who created this shared memory&lt;/li&gt;
&lt;li&gt;Permissions (perms)&lt;/li&gt;
&lt;li&gt;Created size (bytes)&lt;/li&gt;
&lt;li&gt;Number of processes attached to this shared memory (nattach)&lt;/li&gt;
&lt;li&gt;Shared memory status&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When connecting a session to PostgreSQL, one more backend process appears:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------ Shared Memory Segments --------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;key shmid owner perms bytes nattch status 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0x0010c0b6 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; pg &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;nattch+1, indicating that the private backend process also shares a portion of the PG shared memory. At this point, the following diagram is understood more deeply:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/75c502689001.png" alt="Insert image description" /&gt;
(&lt;a href="http://gauss.ececs.uc.edu/Courses/c4029/code/memory/virtual.pdf" target="_blank" rel="noreferrer"&gt;http://gauss.ececs.uc.edu/Courses/c4029/code/memory/virtual.pdf&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;vmstat
 &lt;div id="vmstat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vmstat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man8/vmstat.8.html" target="_blank" rel="noreferrer"&gt;vmstat&lt;/a&gt; is an abbreviation for Virtual Memory Statistics, and can monitor the operating system&amp;rsquo;s virtual memory, processes, and CPU activity. It provides statistics on the overall system situation; the shortcoming is that it cannot perform in-depth analysis of a specific process.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Useful&lt;/em&gt; parameter explanations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vmstat &lt;span style="color:#f92672"&gt;[&lt;/span&gt;options&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;delay &lt;span style="color:#f92672"&gt;[&lt;/span&gt;count&lt;span style="color:#f92672"&gt;]]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;OPTIONS:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-a Display active and inactive memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-m Display slabinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-s Display memory-related statistics and various system activity counts
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-t Append timestamp to each line
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-w Wide output mode. Without w, the output is narrow, reducing alignment issues&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-bash-4.1$ vmstat -w &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r b swpd free buff cache si so bi bo in cs us sy id wa st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;763348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;324&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;76100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;79&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;763340&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;304&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;75764&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;760744&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;244&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;78300&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;228&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;265&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;442&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/95f93eae0f6e.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;pidstat
 &lt;div id="pidstat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pidstat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/pidstat.1.html" target="_blank" rel="noreferrer"&gt;pidstat&lt;/a&gt; is a command from the sysstat tool, used to monitor all or specified processes&amp;rsquo; CPU, memory, threads, device IO, and other system resource usage.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Useful&lt;/em&gt; parameter explanations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pidstat OPTIONS interval &lt;span style="color:#f92672"&gt;[&lt;/span&gt; count &lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-d :Report I/O statistics 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-u :Report CPU utilization
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r :Report page faults and memory utilization
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-w :Report task switching activity
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-p :pid&lt;span style="color:#f92672"&gt;[&lt;/span&gt;,...&lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-l :Display the process command name and all its arguments.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory status of a specific process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-bash-4.1$ pidstat -r -l -p &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Linux 2.6.32-431.el6.x86_64 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 01/06/2024 _x86_64_ &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; CPU&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;02:48:32 PM PID minflt/s majflt/s VSZ RSS %MEM Command
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;02:48:32 PM &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; 0.23 0.00 &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240&lt;/span&gt; 0.02 /pg/pg15.3/bin/postgres -D /pg/1503data &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Various indicators are relatively easy to understand. VSZ, RSS — tired of talking about them.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;minflt/s: Abbreviation for &amp;ldquo;minor page faults&amp;rdquo;, indicating the number of &amp;ldquo;minor page faults&amp;rdquo; that occur per second. A page fault occurs when a program tries to access a page that is not in physical memory. If the page is indeed in the swap area on disk, this is a minor page fault.&lt;/li&gt;
&lt;li&gt;majflt/s: Abbreviation for &amp;ldquo;major page faults&amp;rdquo;, indicating the number of &amp;ldquo;major page faults&amp;rdquo; that occur per second. Unlike minor page faults, major page faults occur when a program tries to access a page that is not in physical memory and is also not in the swap area on disk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;sar
 &lt;div id="sar" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sar" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/sar.1.html" target="_blank" rel="noreferrer"&gt;sar&lt;/a&gt; (System Activity Reporter) is currently one of the most comprehensive system performance analysis tools on Linux. It can report on various aspects of system activity, including: file read/write status, system call usage, disk I/O, CPU efficiency, memory usage, process activity, and IPC-related activity. The SAR tool is part of the sysstat software package.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/886494f9c001.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.brendangregg.com/Perf/linux_observability_sar.png" target="_blank" rel="noreferrer"&gt;https://www.brendangregg.com/Perf/linux_observability_sar.png&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;sar is very powerful. The man parameter introduction alone has over 1k lines. This article cannot possibly explain everything (being lazy).&lt;/p&gt;
&lt;p&gt;Memory-related parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar OPTIONS interval &lt;span style="color:#f92672"&gt;[&lt;/span&gt; count &lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-B :Report paging statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r :Report memory utilization statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-W :Report swapping statistics.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-H :Report hugepages utilization statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-s &lt;span style="color:#f92672"&gt;[&lt;/span&gt; start_time &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt; -e &lt;span style="color:#f92672"&gt;[&lt;/span&gt; end_time &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Example: sar view memory utilization
&lt;code&gt;sar -r 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;kbmemfree: This value is basically consistent with the free value in the free command, so it does not include buffer and cache space&lt;/li&gt;
&lt;li&gt;kbmemused: This value is basically consistent with the used value in the free command, so it includes buffer and cache space&lt;/li&gt;
&lt;li&gt;%memused: This value is kbmemused as a percentage of total memory (excluding swap)&lt;/li&gt;
&lt;li&gt;kbbuffers: buffer in the free command&lt;/li&gt;
&lt;li&gt;kbcached: cache in the free command&lt;/li&gt;
&lt;li&gt;kbcommit: Memory needed to guarantee the current system, i.e., memory needed to ensure no overflow (RAM + swap)&lt;/li&gt;
&lt;li&gt;%commit: This value is kbcommit as a percentage of total memory (including swap)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view memory page status
&lt;code&gt;sar -B 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pgpgin/s: Kilobytes paged in from disk or SWAP to memory per second&lt;/li&gt;
&lt;li&gt;pgpgout/s: Kilobytes paged out from memory to disk or SWAP per second&lt;/li&gt;
&lt;li&gt;fault/s: Number of page faults per second, i.e., sum of major and minor faults&lt;/li&gt;
&lt;li&gt;majflt/s: Number of major faults per second&lt;/li&gt;
&lt;li&gt;pgfree/s: Number of pages placed on the free queue per second&lt;/li&gt;
&lt;li&gt;pgscank/s: Number of pages scanned by kswapd per second&lt;/li&gt;
&lt;li&gt;pgscand/s: Number of pages directly scanned per second&lt;/li&gt;
&lt;li&gt;pgsteal/s: Number of pages reclaimed from cache to meet memory needs per second&lt;/li&gt;
&lt;li&gt;%vmeff: Pages stolen (pgsteal) as a percentage of total scanned pages (pgscank + pgscand) per second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view swap information
&lt;code&gt;sar -W 1 3&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Report explanation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pswpin/s: Number of swap pages swapped in per second&lt;/li&gt;
&lt;li&gt;pswpout/s: Number of swap pages swapped out per second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view historical memory information
&lt;code&gt;sar -B -s &amp;quot;08:00:00&amp;quot; -e &amp;quot;10:00:00&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Without -e, it shows information from the start time to now&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:45:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:46:01 PM 414429.37 395024.08 179478.63 0.07 352922.62 12003.78 4266.52 16269.42 99.99
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:47:01 PM 879907.08 337948.43 157970.97 0.02 402290.21 0.00 0.00 0.00 0.00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:48:01 PM 772977.43 507343.30 150255.50 0.05 466742.08 0.00 5821.28 5821.27 100.00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Above, pgscank represents the speed at which the kswapd process intervenes in memory reclamation, and pgscand represents the speed of direct memory reclamation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;gcore
 &lt;div id="gcore" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gcore" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/gcore.1.html" target="_blank" rel="noreferrer"&gt;gcore&lt;/a&gt; is part of gdb and can generate a core dump file for a process.&lt;/p&gt;
&lt;p&gt;Example: dump a PostgreSQL backend process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; ps -ef|grep &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 09:41 ? 00:00:00 postgres: pg lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.351562
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.445312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0078125&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Process 8296&amp;rsquo;s USS is only 7.8 KB, RSS 445 KB. Dump memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gcore -o /tmp/dump 8296&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dumping takes some time, and the dumped file is relatively large, and it will hang the process.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root&lt;span style="color:#960050;background-color:#1e0010"&gt;@&lt;/span&gt;lzl &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt; ls &lt;span style="color:#f92672"&gt;-&lt;/span&gt;lh &lt;span style="color:#f92672"&gt;/&lt;/span&gt;tmp&lt;span style="color:#f92672"&gt;/&lt;/span&gt;dump&lt;span style="color:#ae81ff"&gt;.8296&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#f92672"&gt;-&lt;/span&gt;r&lt;span style="color:#f92672"&gt;--&lt;/span&gt;r&lt;span style="color:#f92672"&gt;--&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; root root &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt;M Jan &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;tmp&lt;span style="color:#f92672"&gt;/&lt;/span&gt;dump&lt;span style="color:#ae81ff"&gt;.8296&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;gdb
 &lt;div id="gdb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gdb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://sourceware.org/gdb/current/onlinedocs/gdb" target="_blank" rel="noreferrer"&gt;gdb&lt;/a&gt; can view specific locations and content in memory.&lt;/p&gt;
&lt;p&gt;Example: view PostgreSQL backend cached data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open a new session to query a partitioned table, keeping the session open:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; psql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql (&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;help&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; help.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pg&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; is_deleted &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_updated 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Use pmap, smaps to view process memory usage and find the memory segment to dump:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# pmap -x 13393&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;13393: postgres: pg lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1204&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae1b000 &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;176&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; ---RSS takes the most here
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe33ca7000 &lt;span style="color:#ae81ff"&gt;96836&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r---- locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b38000 &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b46000 &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x8001 &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4e000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe3933000 &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe397d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# cat /proc/13393/smaps |grep -A 13 zero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae1b000-7fbe33ca7000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;12556&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;1988&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;176&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;gdb dump memory:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The starting position for dumping memory is the vm address in smaps + &lt;code&gt;0x&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl tmp&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ gdb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; attach &lt;span style="color:#ae81ff"&gt;13393&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; dump memory /tmp/delete.dump 0x7fbe2ae1b000 0x7fbe33ca7000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="4"&gt;
&lt;li&gt;View the dump file:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can simply view it through strings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# strings /tmp/delete.dump|grep lzl|sort|uniq&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301_appl_no_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301_date_created_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306_appl_no_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306_date_created_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @lzlpartition_attach
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_attach
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @nk_lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nk_lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; * from lzlpartition limit 1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As long as the session queries a partitioned table, all partition and index metadata is cached in the backend process.&lt;/p&gt;
&lt;p&gt;Note:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;gdb attach [pid] will hang the process; do not execute casually&lt;/li&gt;
&lt;li&gt;The dump file size equals VSS, generally much larger than RSS/PSS/USS&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Memory Summary
 &lt;div id="memory-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/be2156c8a394.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Easily Break Through File I/O Bottlenecks: Memory-Mapped mmap Technology &lt;a href="https://blog.51cto.com/u_15481245/6582927" target="_blank" rel="noreferrer"&gt;https://blog.51cto.com/u_15481245/6582927&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Step by Step with Diagrams: Deep Understanding of Linux Physical Memory Management &lt;a href="https://cloud.tencent.com/developer/article/2352771?areaId=106001" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/2352771?areaId=106001&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Systematically Learning Memory Management from a DBA&amp;rsquo;s Perspective &lt;a href="https://mp.weixin.qq.com/s/CybzGP44dVWQN5hfFrVx7A" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/CybzGP44dVWQN5hfFrVx7A&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://linux2me.wordpress.com/2017/09/15/linux-introduction-to-memory-management/" target="_blank" rel="noreferrer"&gt;https://linux2me.wordpress.com/2017/09/15/linux-introduction-to-memory-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Memory management in Linux &lt;a href="https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux Performance Tunning Memory &lt;a href="https://www.slideshare.net/shayc1/linux-performance-tunning-memory?from_search=4" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/shayc1/linux-performance-tunning-memory?from_search=4&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;How to Learn the Linux Kernel (Memory Chapter) &lt;a href="https://mp.weixin.qq.com/s/lKKHH1MMiZbnIbDQt3-IAQ" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/lKKHH1MMiZbnIbDQt3-IAQ&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux Process Virtual Address Space &lt;a href="https://maodanp.github.io/2019/06/02/linux-virtual-space/" target="_blank" rel="noreferrer"&gt;https://maodanp.github.io/2019/06/02/linux-virtual-space/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Red Hat Official Documentation &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Data Processing on Modern Hardware &lt;a href="https://db.in.tum.de/teaching/ss21/dataprocessingonmodernhardware/MH_8.pdf?lang=de" target="_blank" rel="noreferrer"&gt;https://db.in.tum.de/teaching/ss21/dataprocessingonmodernhardware/MH_8.pdf?lang=de&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Chapter 2 Describing Physical Memory &lt;a href="https://www.kernel.org/doc/gorman/html/understand/understand005.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/understand005.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Various command man pages&lt;/p&gt;
&lt;p&gt;Linux Forced Memory Reclamation, Linux Memory Source Code Analysis - Memory Reclamation (Overall Process) &lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_35094083/article/details/116688112&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Memory compaction &lt;a href="https://lwn.net/Articles/368869/%3E" target="_blank" rel="noreferrer"&gt;https://lwn.net/Articles/368869/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Memory Journey — How to Improve CMA Utilization? &lt;a href="https://ost.51cto.com/posts/10815" target="_blank" rel="noreferrer"&gt;https://ost.51cto.com/posts/10815&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The implementations of anti pages fragmentation in Linux kernel &lt;a href="https://teawater.github.io/presentation/antif.pdf" target="_blank" rel="noreferrer"&gt;https://teawater.github.io/presentation/antif.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;T H E /proc F I L E S Y S T E M &lt;a href="https://www.kernel.org/doc/Documentation/filesystems/proc.txt" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/Documentation/filesystems/proc.txt&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The /proc/meminfo File in Linux &lt;a href="https://www.baeldung.com/linux/proc-meminfo" target="_blank" rel="noreferrer"&gt;https://www.baeldung.com/linux/proc-meminfo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;the proc filesystem &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Introduction and Usage of Linux /proc/{pid}/maps (Locating Memory Leaks) &lt;a href="https://blog.csdn.net/mijichui2153/article/details/123934531" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/mijichui2153/article/details/123934531&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CPU and Memory Usage in Linux top Command &lt;a href="https://blog.csdn.net/weixin_45030965/article/details/127693042" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_45030965/article/details/127693042&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;smem memory reporting tool &lt;a href="https://www.selenic.com/smem/" target="_blank" rel="noreferrer"&gt;https://www.selenic.com/smem/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux performance optimization &lt;a href="https://feiyang233.club/post/linux/" target="_blank" rel="noreferrer"&gt;https://feiyang233.club/post/linux/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;gdb onlinedocs &lt;a href="https://sourceware.org/gdb/current/onlinedocs/gdb" target="_blank" rel="noreferrer"&gt;https://sourceware.org/gdb/current/onlinedocs/gdb&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux_Core_Dumps &lt;a href="https://averageradical.github.io/Linux_Core_Dumps.pdf" target="_blank" rel="noreferrer"&gt;https://averageradical.github.io/Linux_Core_Dumps.pdf&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of PostgreSQL FDW</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-fdw/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-fdw/</guid><description>&lt;h2 class="relative group"&gt;FDW Basic Concepts
 &lt;div id="fdw-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is SQL/MED?
 &lt;div id="what-is-sqlmed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-sqlmed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SQL/MED aims to unify access methods for heterogeneous data sources. In 2003, SQL/MED was added to the ISO/IEC 9075-9 standard, defined as a SQL standard extension for &lt;strong&gt;managing external data&lt;/strong&gt; via foreign-data wrappers (FDW) or datalink (such as Oracle or PG&amp;rsquo;s dblink). In short, SQL/MED is an international SQL extension standard. Many databases already support SQL/MED, such as DB2, MariaDB, PG, and more.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;FDW Basic Concepts
 &lt;div id="fdw-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is SQL/MED?
 &lt;div id="what-is-sqlmed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-sqlmed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SQL/MED aims to unify access methods for heterogeneous data sources. In 2003, SQL/MED was added to the ISO/IEC 9075-9 standard, defined as a SQL standard extension for &lt;strong&gt;managing external data&lt;/strong&gt; via foreign-data wrappers (FDW) or datalink (such as Oracle or PG&amp;rsquo;s dblink). In short, SQL/MED is an international SQL extension standard. Many databases already support SQL/MED, such as DB2, MariaDB, PG, and more.&lt;/p&gt;
&lt;p&gt;Without SQL/MED, applications must access required data sources themselves and process data at the application layer:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4d2dae15ed42.png" alt="1" /&gt;&lt;/p&gt;
&lt;p&gt;With SQL/MED, the data access architecture becomes clearer:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ab659ea2f77d.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;However, while this architecture diagram appears simpler, it increases the database&amp;rsquo;s IO and computation pressure. This goes against the modern trend of decoupling computation from the database to the application layer.&lt;/p&gt;
&lt;p&gt;Of course, both approaches have their pros and cons, and SQL/MED is still used in certain scenarios.&lt;/p&gt;
&lt;p&gt;SQL/MED exists as a standard, and PostgreSQL supports the SQL/MED standard excellently through FDW.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What is FDW?
 &lt;div id="what-is-fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0c0845d79809.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has supported FDW since version 9.1. Users can access external data (foreign data) through regular SQL statements. Foreign data is accessed via a foreign data wrapper (FDW). The FDW in PostgreSQL is itself a library — because different external data sources correspond to different FDW extensions, we often call it an FDW plugin.&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s FDW functionality is extremely powerful: it not only supports multiple data sources but also optimizes data access, and can even be used for &amp;ldquo;beyond expectations&amp;rdquo; purposes, such as implementing cluster functionality.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Installation and Download
 &lt;div id="installation-and-download" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#installation-and-download" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Basically every type of database and data format has its own FDW plugin: oracle_fdw for Oracle databases, mysql_fdw for MySQL databases, and so on. FDW plugins can be installed directly or downloaded:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;FDWs already included as extensions: file_fdw, postgres_fdw, cstore_fdw&lt;/li&gt;
&lt;li&gt;Other FDW plugins can be downloaded from PGXN or the wiki, such as: oracle_fdw, mysql_fdw, json_fdw. Be sure to read the README carefully to understand each FDW&amp;rsquo;s limitations and usage rules.&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;FDW plugin download: &lt;a href="https://pgxn.org/tag/fdw/" target="_blank" rel="noreferrer"&gt;https://pgxn.org/tag/fdw/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;More FDWs (mostly beta): &lt;a href="https://wiki.postgresql.org/wiki/Foreign_data_wrappers" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Foreign_data_wrappers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Write your own FDW: &lt;a href="https://www.postgresql.org/docs/current/fdwhandler.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/fdwhandler.html&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Advantages of FDW over dblink in PG
 &lt;div id="advantages-of-fdw-over-dblink-in-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#advantages-of-fdw-over-dblink-in-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG also has dblink. FDW and dblink are functionally similar — both access external tables. But FDW has more advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FDW supports many more data sources (a LOT more). dblink only supports PostgreSQL databases, equivalent to just one FDW plugin — postgres_fdw (which is actually much more powerful).&lt;/li&gt;
&lt;li&gt;Transparent to developers. External tables can be accessed just like regular tables.&lt;/li&gt;
&lt;li&gt;More compliant with standard SQL syntax.&lt;/li&gt;
&lt;li&gt;Better performance in many scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;The functionality provided by this module overlaps substantially with the functionality of the older &lt;a href="https://www.postgresql.org/docs/15/dblink.html" title="F.12. dblink" target="_blank" rel="noreferrer"&gt;dblink&lt;/a&gt; module. But &lt;code&gt;postgres_fdw&lt;/code&gt; provides more transparent and standards-compliant syntax for accessing remote tables, and can give better performance in many cases.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In summary, FDW is stronger than the dblink plugin — you can basically forget about dblink.&lt;/p&gt;

&lt;h2 class="relative group"&gt;FDW&amp;rsquo;s Four Objects
 &lt;div id="fdws-four-objects" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdws-four-objects" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Different FDWs have different usage patterns, but generally all require creating 4 objects: &lt;strong&gt;foreign data wrapper&lt;/strong&gt;, &lt;strong&gt;server&lt;/strong&gt;, &lt;strong&gt;user mapping&lt;/strong&gt;, &lt;strong&gt;foreign table&lt;/strong&gt;. Some objects are not mandatory — for example, file_fdw doesn&amp;rsquo;t need a user mapping, while relational database FDWs generally require one.&lt;/p&gt;

&lt;h3 class="relative group"&gt;foreign data wrapper
 &lt;div id="foreign-data-wrapper" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreign-data-wrapper" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After creating the corresponding FDW extension with CREATE EXTENSION, the foreign data wrapper is automatically created.&lt;/p&gt;
&lt;p&gt;For example, creating a file_fdw extension:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension file_fdw;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Version&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------+---------+------------+------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;foreign&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; wrapper &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; flat file &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_data_wrappers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_data_wrapper_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; authorization_identifier &lt;span style="color:#f92672"&gt;|&lt;/span&gt; library_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_language
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------+---------------------------+--------------------------+--------------+-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can also create a foreign data wrapper manually without using an extension. See &lt;a href="https://www.postgresql.org/docs/13/sql-createforeigndatawrapper.html" target="_blank" rel="noreferrer"&gt;CREATE FOREIGN DATA WRAPPER&lt;/a&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;server
 &lt;div id="server" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#server" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CREATE SERVER creates an external service, essentially specifying the data source. The OPTIONS syntax varies by foreign-data wrapper — for example, the OPTION syntax for file_fdw and postgres_fdw is definitely different. At this point, you need to read the FDW plugin&amp;rsquo;s README or official documentation. For example:&lt;/p&gt;
&lt;p&gt;Create a file_fdw external service named fileserver:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER fileserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create a postgres_fdw external service named pgserver, pointing to the lzldb database on a PG instance at 172.0.0.1:5432:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER pgserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER postgres_fdw &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;172.0.0.1&amp;#39;&lt;/span&gt;, dbname &lt;span style="color:#e6db74"&gt;&amp;#39;lzldb&amp;#39;&lt;/span&gt;, port &lt;span style="color:#e6db74"&gt;&amp;#39;5432&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View servers:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_servers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_version &lt;span style="color:#f92672"&gt;|&lt;/span&gt; authorization_identifier
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------+---------------------+------------------------------+---------------------------+---------------------+------------------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgserver &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fileserver &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;user mapping
 &lt;div id="user-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#user-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;User mapping defines the correspondence between external service users and local users. Therefore, relational database FDWs generally have user mappings, while file-type FDWs without user definitions don&amp;rsquo;t need them.&lt;/p&gt;
&lt;p&gt;For example, create a user mapping using the pgserver from above:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USER&lt;/span&gt; MAPPING &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; localuser SERVER pgserver &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;remoteuser&amp;#39;&lt;/span&gt;, password &lt;span style="color:#e6db74"&gt;&amp;#39;mypasswd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View user mappings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.user_mappings;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; authorization_identifier &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------+------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; localuser &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgserver&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;foreign table
 &lt;div id="foreign-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreign-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Foreign tables map remote tables locally, allowing them to be accessed like regular tables. Since local objects are involved and there are many OPTIONS, the full syntax is somewhat complex. See &lt;a href="https://www.postgresql.org/docs/current/sql-createforeigntable.html" target="_blank" rel="noreferrer"&gt;CREATE FOREIGN TABLE&lt;/a&gt;. Simply put, you create a locally corresponding remote table.&lt;/p&gt;
&lt;p&gt;Two common ways to create foreign tables: creation and import.&lt;/p&gt;
&lt;p&gt;Create a foreign table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; localtable (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id char(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SERVER pgserver &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;remotetable&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Creating foreign tables one by one is tedious — you can import all tables from a remote schema at once:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; remoteschema &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER pgserver &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; localschema;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View foreign tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; information_schema.foreign_tables; &lt;span style="color:#75715e"&gt;-- Intuitive view of foreign tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_foreign_server; &lt;span style="color:#75715e"&gt;-- Less intuitive, but shows OPTION settings&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Using FDW
 &lt;div id="using-fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Viewing Foreign Table Information
 &lt;div id="viewing-foreign-table-information" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-foreign-table-information" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;psql&amp;rsquo;s built-in shortcuts are quite clear for viewing the 4 objects of foreign tables, but pay attention to search_path settings:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;psql command&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;\des&lt;/td&gt;
 &lt;td&gt;list foreign servers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\deu&lt;/td&gt;
 &lt;td&gt;list user mappings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\det&lt;/td&gt;
 &lt;td&gt;list foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\dtE&lt;/td&gt;
 &lt;td&gt;list both local and foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Foreign table object views/tables can be messy — here&amp;rsquo;s a quick organization:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign data wrapper tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_data_wrappers&lt;/td&gt;
 &lt;td&gt;More complete information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_data_wrappers&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_data_wrapper_options&lt;/td&gt;
 &lt;td&gt;Targeted query of foreign data wrapper options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_foreign_data_wrapper&lt;/td&gt;
 &lt;td&gt;Slightly less info, but has permission info that other views lack&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign server tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_servers&lt;/td&gt;
 &lt;td&gt;More complete information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_servers&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_server_options&lt;/td&gt;
 &lt;td&gt;Targeted option query — one record per option, not per server&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_foreign_server&lt;/td&gt;
 &lt;td&gt;Less information, base table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;user mapping tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_user_mappings&lt;/td&gt;
 &lt;td&gt;Fairly complete user mapping information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.user_mappings&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.user_mapping_options&lt;/td&gt;
 &lt;td&gt;Targeted query of UM options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_user_mappings&lt;/td&gt;
 &lt;td&gt;Slightly less than _pg_user_mappings. Viewable by unprivileged users — passwords show as null&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_user_mapping&lt;/td&gt;
 &lt;td&gt;Less information, base table, mainly options. Inaccessible to unprivileged users&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign table tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_tables&lt;/td&gt;
 &lt;td&gt;More complete, shows all foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_table_columns&lt;/td&gt;
 &lt;td&gt;Shows column-to-column mappings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_table_options&lt;/td&gt;
 &lt;td&gt;Targeted display of foreign table options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;foreign_tables&lt;/td&gt;
 &lt;td&gt;Less information, base table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These views/tables look messy but actually have a clear structure. The 4 object types all follow the same data dictionary pattern:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6805aee46c58.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_xxx are base tables, the foundational information source for the 4 objects&lt;/li&gt;
&lt;li&gt;information_schema._pg_xxx joins pg_xxx base tables with other info — it&amp;rsquo;s a summary view with comprehensive information&lt;/li&gt;
&lt;li&gt;information_schema.xxx is a view on information_schema._pg_xxx, with less information&lt;/li&gt;
&lt;li&gt;information_schema.xxx_options provides targeted option information, sourced only from the full view information_schema._pg_xxx&lt;/li&gt;
&lt;li&gt;A special view: pg_user_mappings, usable even by unprivileged users&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Permission Considerations
 &lt;div id="permission-considerations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#permission-considerations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If you use the postgres superuser throughout to create foreign tables, you&amp;rsquo;ll rarely encounter issues. But in production, application users are typically not superusers. Therefore, permissions are extremely important — not only important but also quite troublesome. Using a regular user for testing is crucial (as with any testing). PG&amp;rsquo;s permission system is like a boss battle — missing any link won&amp;rsquo;t work.&lt;/p&gt;
&lt;p&gt;Key permission points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Foreign data wrapper, server, and user mapping owners are their creators. Users must be granted USAGE privilege or be the owner themselves to use them.&lt;/li&gt;
&lt;li&gt;Accessing remote data sources requires users with appropriate permissions — specified in the user mapping step with suitable remote login credentials.&lt;/li&gt;
&lt;li&gt;After creating/importing foreign tables locally, these objects are treated as local objects (only the data dictionary is local). So PG&amp;rsquo;s local object access permission system must also be properly configured.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;FDW Usage Examples
 &lt;div id="fdw-usage-examples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-usage-examples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are hundreds of FDW implementations for various data sources worldwide — relational databases, NoSQL databases, various file types, Web Services, columnar storage, big data, and more. Here are a few common FDWs.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Using postgres_fdw
 &lt;div id="using-postgres_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-postgres_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;This is probably the most commonly used and most powerful FDW. It allows accessing external PostgreSQL databases from a local database. It can also be used for self-access — this is important because: &lt;strong&gt;PostgreSQL cannot access across databases internally!&lt;/strong&gt; To solve this problem, a good approach is using FDW for cross-database access within the same instance — accessing yourself through an external connection.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an example of cross-database access using postgres_fdw:&lt;/p&gt;
&lt;p&gt;An instance has two databases: aka and bkb. You can&amp;rsquo;t query both databases in a single SQL statement — databases in PG are logically isolated, somewhat like Oracle 12c PDBs.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[lzl&lt;span style="color:#f92672"&gt;@&lt;/span&gt;postgres]&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt;Tc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CTc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bkb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt;Tc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CTc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although both databases are local, when using FDW we still need the local/remote database concept. Here we treat aka as the local database and bkb as the remote database, enabling access to bkb&amp;rsquo;s tables from aka while handling permission issues.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Install FDW plugin&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension postgres_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: Extensions are database-level — switch to the local database first.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Grant user permissions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;usage&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;foreign&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; wrapper postgres_fdw &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; akadata;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create server&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka akadata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER postgres_fdw &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;127.0.0.1&amp;#39;&lt;/span&gt;, port &lt;span style="color:#e6db74"&gt;&amp;#39;5432&amp;#39;&lt;/span&gt;, dbname &lt;span style="color:#e6db74"&gt;&amp;#39;bkb&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;4. Create user mapping&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USER&lt;/span&gt; MAPPING &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; akadata SERVER bkb_server &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;bkbdata&amp;#39;&lt;/span&gt;, password &lt;span style="color:#e6db74"&gt;&amp;#39;bkbpasswd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;5. Create schema in aka database, grant to akadata user&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;usage&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; akadata;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--GRANT select ON ALL TABLES IN SCHEMA bkb TO akadata;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; akadata;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;6. Import bkb tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka akadata&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Import entire schema:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; bkb;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Import a single table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (tab1) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; bkb&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;7. View foreign tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_tables;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_table_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_table_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------+----------------------+-------------------------------------+------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bkb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tab1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bkb_server&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Using file_fdw
 &lt;div id="using-file_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-file_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The file_fdw extension provides PG with read-only access to external files. file_fdw is already in contrib and can be installed with &lt;code&gt;CREATE EXTENSION&lt;/code&gt;. External files must conform to COPY rules.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a classic example of mapping PG output logs to a foreign table, script from the &lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Create file_fdw extension&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Create external server&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER fileserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create foreign table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; pglog (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_time &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; user_name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; database_name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; process_id integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; connection_from text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_id text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_line_num bigint,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; command_tag text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_start_time &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtual_transaction_id text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transaction_id bigint,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; error_severity text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sql_state_code text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; message text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; detail text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hint text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; internal_query text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; internal_query_pos integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; context text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query_pos integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;location&lt;/span&gt; text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; application_name text
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) SERVER fileserver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; ( filename &lt;span style="color:#e6db74"&gt;&amp;#39;pg_log/postgresql-07-06.csv&amp;#39;&lt;/span&gt;, format &lt;span style="color:#e6db74"&gt;&amp;#39;csv&amp;#39;&lt;/span&gt; );&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;4. Query the log table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; user_name,database_name,process_id,error_severity,message &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pglog &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; error_severity&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;LOG&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; user_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; database_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; process_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; error_severity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; message
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+---------------+------------+----------------+-----------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;102349&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; value too long &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55378&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; value too long &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;219377&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#e6db74"&gt;&amp;#34;dual&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Deep Dive into postgres_fdw
 &lt;div id="deep-dive-into-postgres_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deep-dive-into-postgres_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;postgres_fdw Performance Optimization
 &lt;div id="postgres_fdw-performance-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgres_fdw-performance-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Unlike most FDW plugins, postgres_fdw is an official plugin maintained by the PostgreSQL Global Development Group, with its source code in contrib. Because external services differ in functionality and structure, some features — such as obtaining remote database access costs or aggregate pushdown in certain scenarios — are difficult to implement in other FDWs. But in postgres_fdw they&amp;rsquo;re achievable. The official team has done extensive optimization for postgres_fdw, making it extremely powerful.&lt;/p&gt;

&lt;h4 class="relative group"&gt;SQL Execution Process
 &lt;div id="sql-execution-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-execution-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2d6d90fc0f63.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The parser generates a query tree from the foreign table definition.&lt;/li&gt;
&lt;li&gt;The planner connects to the foreign server.&lt;/li&gt;
&lt;li&gt;Obtain cost information. If &lt;code&gt;use_remote_estimate&lt;/code&gt; is true (default), the planner executes EXPLAIN on the remote database to get access costs (step 3); if false, it calculates locally instead.&lt;/li&gt;
&lt;li&gt;Deparse generates remote SQL text. &lt;strong&gt;FDW accesses remote database objects by sending SQL text&lt;/strong&gt; — the planner generates SQL text for remote execution. The &lt;code&gt;Remote SQL&lt;/code&gt; part of the execution plan directly shows the deparsed SQL:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; ((a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Send SQL statement and receive data. The remote database executes the SQL independently and returns results to the local database based on fetch_size (default 100 rows).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Cost Estimation
 &lt;div id="cost-estimation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cost-estimation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;postgres_fdw can pass remote database object access costs to the local database for calculating the overall SQL execution plan cost. However, simply returning the remote estimated cost isn&amp;rsquo;t enough — the cost of remote access itself must also be considered. postgres_fdw provides 3 OPTIONS to adjust foreign table cost estimation:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;use_remote_estimate&lt;/strong&gt;: When set to true, the planner runs EXPLAIN on the remote database to get estimated costs, adding fdw_startup_cost and fdw_tuple_cost. When false (default), the planner calculates locally and adds fdw_startup_cost and fdw_tuple_cost. Local foreign table statistics may differ from actual values.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fdw_startup_cost&lt;/strong&gt;: Startup cost for foreign tables, default 100. Represents the cost of establishing a connection, parsing, and generating a plan on the external service.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fdw_tuple_cost&lt;/strong&gt;: Additional cost per tuple scanned from a foreign table, default 0.01. Represents data transfer cost — higher latency should mean higher settings.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Aggregate Pushdown
 &lt;div id="aggregate-pushdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#aggregate-pushdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Aggregate pushdown executes computations on the remote database, with the local database directly receiving the remote execution results. Without aggregate pushdown, all data must be returned to the local database for computation, increasing data transfer&amp;rsquo;s impact on SQL execution efficiency and the local database&amp;rsquo;s computational burden.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(In this environment, bkb.&lt;/em&gt; are all foreign tables, local tables are public.&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Predicate Pushdown&lt;/strong&gt;: postgres_fdw supports WHERE pushdown — no need to return all data to the local database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; ((a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Sort Pushdown&lt;/strong&gt;: postgres_fdw supports sort pushdown, sending sorts to the remote database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt; nulls &lt;span style="color:#66d9ef"&gt;first&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;FIRST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Join Pushdown&lt;/strong&gt;: Some joins cannot be pushed down, like local table JOIN foreign table — only the foreign table results can be brought locally for joining.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a,l2.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1,tab1 l2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;l2.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a, l2.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (l2.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; f1.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 l2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: l2.a, l2.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When both tables are foreign tables, joins can be pushed down to the remote database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a,f1.b &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; bkb.tab2 f2 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;f2.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a, f1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Relations: (bkb.tab1 f1) &lt;span style="color:#66d9ef"&gt;LEFT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; (bkb.tab2 f2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; r1.a, r1.b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 r1 &lt;span style="color:#66d9ef"&gt;LEFT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab2 r2 &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; (((r1.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; r2.a))))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Aggregate Function Pushdown&lt;/strong&gt;: Supports pushing down aggregate functions — functions must be &lt;code&gt;IMMUTABLE&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; b,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;avg&lt;/span&gt;(a) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; GroupAggregate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: b, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;avg&lt;/span&gt;(a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: tab1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a, b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a, b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;ASC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;LAST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Some scenarios aren&amp;rsquo;t supported, such as HAVING clauses that can only filter locally:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; b,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;having&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; GroupAggregate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: b, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: tab1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a, b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;ASC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;LAST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Other Features
 &lt;div id="other-features" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-features" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Remote Execution OPTION Settings
 &lt;div id="remote-execution-option-settings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#remote-execution-option-settings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;extensions&lt;/strong&gt;: User-specified FDW extensions that can use &amp;ldquo;remote computation&amp;rdquo;. Can only be set at the server level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fetch_size&lt;/strong&gt;: Number of rows fetched per batch from the remote database, default 100. Can be set at server or table level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;updatable&lt;/strong&gt;: By default, postgres_fdw foreign tables are updatable. The updatable option can control this. If a foreign table is inherently non-updatable, setting updatable to false at the table level causes errors directly locally.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;truncatable&lt;/strong&gt;: Starting from PG14, postgres_fdw supports truncating foreign tables, controlled by the &lt;code&gt;truncatable&lt;/code&gt; option, defaulting to true.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Connection Management
 &lt;div id="connection-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;On the first foreign table access in a session, a connection to the remote database is established. As long as the local session hasn&amp;rsquo;t disconnected, this connection is reused. If multiple user mappings are used, a connection is established for each user mapping.&lt;/p&gt;
&lt;p&gt;Starting from PG14, the &lt;code&gt;keep_connections&lt;/code&gt; option controls this behavior. Defaults to on, meaning the session can reuse this connection later; when off, the connection is closed at transaction end.&lt;/p&gt;
&lt;p&gt;PG14+: &lt;code&gt;postgres_fdw_get_connections()&lt;/code&gt; can view connection status.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Transaction Management
 &lt;div id="transaction-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Important FDW transaction characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The remote database executes SQL based on the text sent by the local database.&lt;/li&gt;
&lt;li&gt;When the local database has SERIALIZABLE isolation level, the remote also uses SERIALIZABLE; otherwise, the remote uses REPEATABLE READ.&lt;/li&gt;
&lt;li&gt;When the local transaction commits or rolls back, the remote transaction also commits or rolls back.&lt;/li&gt;
&lt;li&gt;FDW does not support 2PC transactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without distributed 2PC transaction support, partial commits may occur. For example, even if a remote update fails, the local update can still complete:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;123&amp;#39;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42703&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;c&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;No Distributed Lock Management
 &lt;div id="no-distributed-lock-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-distributed-lock-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;FDW has no distributed lock management, hence no distributed deadlock detection mechanism.&lt;/p&gt;
&lt;p&gt;Deadlock detection works for local tables but not for foreign tables.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Asynchronous Execution
 &lt;div id="asynchronous-execution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#asynchronous-execution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Starting from PG14, postgres_fdw supports asynchronous execution. When there are multiple Append nodes in the execution plan, they can execute in parallel, improving performance when accessing multiple foreign tables.&lt;/p&gt;
&lt;p&gt;Asynchronous execution only occurs with multiple sessions — i.e., multiple user mappings. The &lt;code&gt;async_capable&lt;/code&gt; option controls this, defaulting to false. The &lt;code&gt;enable_async_append&lt;/code&gt; parameter must also be enabled (default on).&lt;/p&gt;

&lt;h4 class="relative group"&gt;Parallel Commit
 &lt;div id="parallel-commit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parallel-commit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Starting from PG15, postgres_fdw supports parallel commit. Remote transactions commit alongside local transactions. Without parallel commit/rollback, PG can only commit/rollback remote transactions serially.&lt;/p&gt;

&lt;h3 class="relative group"&gt;postgres_fdw Version History
 &lt;div id="postgres_fdw-version-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgres_fdw-version-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Version&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Release Support Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;9.3&lt;/td&gt;
 &lt;td style="text-align: left"&gt;postgres_fdw released&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9.6&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Support pushdown of join, sort, update, delete; fetch_size support&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down aggregate functions to remote server; more join pushdown scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down operators to partitioned tables; UPDATE/DELETE joins can push down&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;12&lt;/td&gt;
 &lt;td style="text-align: left"&gt;More order by/limit pushdown scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;13&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Enhanced password authentication; pg_dump can export foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;14&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Parallel scanning for queries with multiple foreign tables (async_capable); bulk insert; postgres_fdw_get_connections(); TRUNCATE foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;15&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down CASE expressions; parallel commit (parallel_commit)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;16&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Interruptible parallel transactions; foreign table analyze_sampling; COPY batch_size; foreign table truncate triggers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;Sharding Implementation
 &lt;div id="sharding-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sharding-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;FDW-based Sharding
 &lt;div id="fdw-based-sharding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-based-sharding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Many PostgreSQL forks (XC/XL, Citus, etc.) have implemented sharding, but PostgreSQL itself is a single-instance database without native sharding support. Since SQL/MED was defined for accessing external data, postgres_fdw can implement sharding by accessing external instances.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Core Sharding Features
 &lt;div id="core-sharding-features" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#core-sharding-features" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Key features needed for usable sharding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Partition management — SQL/MED transparency allows sharding on partitioned tables.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Partition optimization — partition pruning, PARTITION WISE JOIN, etc.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Aggregate pushdown — push computation to shard nodes.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Parallel scanning — PG14 implemented.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; 2PC transactions — FDW doesn&amp;rsquo;t yet support this.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Shard management — foreign table partitions must be manually created and added.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Global transactions — global clocks, global snapshot management needed.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Distributed locks — stronger distributed lock mechanisms needed.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Batch writes — DML/COPY distribution to shards needs batch write support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL&amp;rsquo;s FDW functionality derives from the SQL/MED standard for accessing external data, supporting many data source types.&lt;/li&gt;
&lt;li&gt;FDW has 4 basic objects: foreign data wrapper, server, user mapping, foreign table.&lt;/li&gt;
&lt;li&gt;postgres_fdw has many feature enhancements and performance optimizations, capable of pushing operators down to remote databases.&lt;/li&gt;
&lt;li&gt;Sharding can be implemented based on postgres_fdw, though some features still need improvement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql04.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql04.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/13/postgres-fdw.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/postgres-fdw.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/file-fdw.html&lt;/a&gt;
&lt;a href="https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/postgres_fdw-enhancement-in-postgresql-14/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/postgres_fdw-enhancement-in-postgresql-14/&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/foreign-data-wrappers-postgresql-postgres_fdw/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/foreign-data-wrappers-postgresql-postgres_fdw/&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/parallel-commits-for-transactions-using-postgres_fdw-on-postgresql-15/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/parallel-commits-for-transactions-using-postgres_fdw-on-postgresql-15/&lt;/a&gt;
&lt;a href="https://www.enterprisedb.com/blog/postgresql-aggregate-push-down-postgresfdw" target="_blank" rel="noreferrer"&gt;https://www.enterprisedb.com/blog/postgresql-aggregate-push-down-postgresfdw&lt;/a&gt;
&lt;a href="https://www.postgresql.fastware.com/postgresql-insider-fdw-ove" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/postgresql-insider-fdw-ove&lt;/a&gt;
&lt;a href="https://momjian.us/main/writings/pgsql/sharding.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/sharding.pdf&lt;/a&gt;
&lt;a href="https://www.slideserve.com/johnna/sql-med-and-more-powerpoint-ppt-presentation" target="_blank" rel="noreferrer"&gt;https://www.slideserve.com/johnna/sql-med-and-more-powerpoint-ppt-presentation&lt;/a&gt;
&lt;a href="https://dbaplus.cn/news-19-2090-1.html" target="_blank" rel="noreferrer"&gt;https://dbaplus.cn/news-19-2090-1.html&lt;/a&gt;
&lt;a href="https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/&lt;/a&gt;
&lt;a href="https://www.highgo.ca/2021/06/28/parallel-execution-of-postgres_fdw-scans-in-pg-14-important-step-forward-for-horizontal-scaling/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2021/06/28/parallel-execution-of-postgres_fdw-scans-in-pg-14-important-step-forward-for-horizontal-scaling/&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of PostgreSQL Memory</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-memory/</guid><description>&lt;h2 class="relative group"&gt;Architecture
 &lt;div id="architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8ca0ab97a875.png" alt="Shared Memory in PostgreSQL" /&gt;
(&lt;a href="https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ec5a1dae77e.png" alt="PostgreSQL Process Structure and Memory Structure - Figure 2" /&gt;
(&lt;a href="http://geekdaxue.co/read/fcant@sql/qts5is" target="_blank" rel="noreferrer"&gt;http://geekdaxue.co/read/fcant@sql/qts5is&lt;/a&gt;)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Linux Shared Memory Implementation
 &lt;div id="linux-shared-memory-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#linux-shared-memory-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/026fc1403eb5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Memory on Linux&lt;/strong&gt;
Shared memory is an IPC (Inter-Process Communication) mechanism supported by Unix-based operating systems (including Linux). It is a type of memory that multiple processes can simultaneously use to communicate with each other. Shared memory is one of the fastest IPC mechanisms because it does not require processes to copy data between each other. Processes can access shared memory through their own address space.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Architecture
 &lt;div id="architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8ca0ab97a875.png" alt="Shared Memory in PostgreSQL" /&gt;
(&lt;a href="https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ec5a1dae77e.png" alt="PostgreSQL Process Structure and Memory Structure - Figure 2" /&gt;
(&lt;a href="http://geekdaxue.co/read/fcant@sql/qts5is" target="_blank" rel="noreferrer"&gt;http://geekdaxue.co/read/fcant@sql/qts5is&lt;/a&gt;)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Linux Shared Memory Implementation
 &lt;div id="linux-shared-memory-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#linux-shared-memory-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/026fc1403eb5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Memory on Linux&lt;/strong&gt;
Shared memory is an IPC (Inter-Process Communication) mechanism supported by Unix-based operating systems (including Linux). It is a type of memory that multiple processes can simultaneously use to communicate with each other. Shared memory is one of the fastest IPC mechanisms because it does not require processes to copy data between each other. Processes can access shared memory through their own address space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two Forms of Shared Memory&lt;/strong&gt;
One form of shared memory is memory-mapped files. Once multiple processes map the same file into their address space, they can access the file&amp;rsquo;s contents and simultaneously update the file directly using the mapped memory. Another form of shared memory is anonymous memory. This refers to shared memory regions allocated by programs without associating them with a file or persistent storage mechanism.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;mmap()&lt;/strong&gt;
Mapping a file into a process&amp;rsquo;s address space uses &lt;code&gt;mmap()&lt;/code&gt;. Anonymous memory can also be created with &lt;code&gt;mmap()&lt;/code&gt;. &lt;a href="https://www.man7.org/linux/man-pages/man2/mmap.2.html" target="_blank" rel="noreferrer"&gt;mmap&lt;/a&gt; is part of the standard C library. For anonymous memory, the flags should be &lt;code&gt;MAP_ANONYMOUS&lt;/code&gt; or &lt;code&gt;MAP_ANON&lt;/code&gt;, in which case &lt;code&gt;fd&lt;/code&gt; should be NULL or -1, and &lt;code&gt;offset&lt;/code&gt; should be 0.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fcd702da523d.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.tutorialsdaddy.com/courses/linux-device-driver/lessons/mmap/" target="_blank" rel="noreferrer"&gt;http://www.tutorialsdaddy.com/courses/linux-device-driver/lessons/mmap/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shared Memory in PostgreSQL
 &lt;div id="shared-memory-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0a37e863fe80.png" alt="Image" /&gt;
&lt;a href="https://www.interdb.jp/pg/pgsql02.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has many types of shared memory: shared buffers, WAL buffer, CLOG buffer, lock space, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Buffer&lt;/strong&gt;
The shared memory area where PostgreSQL caches data, similar to Oracle&amp;rsquo;s SGA. When data hits the shared buffer, it is read directly from memory without requiring disk I/O.
PostgreSQL loads table pages and indexes from persistent storage into this area and operates on them directly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;WAL Buffer&lt;/strong&gt;
To ensure no data is lost in the event of a server failure, PostgreSQL supports the WAL mechanism. WAL data (also called XLOG records) is PostgreSQL&amp;rsquo;s transaction log. The WAL BUFFER is the buffer for WAL data before it is written to persistent storage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG BUFFER&lt;/strong&gt;
The Commit Log (CLOG) maintains the status of all transactions (e.g., in_progress, committed, aborted) for the concurrency control mechanism. The corresponding CLOG BUFFER is the buffer for CLOG data before it is written to disk.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PostgreSQL Shared Memory Parameters
 &lt;div id="postgresql-shared-memory-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-shared-memory-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;shared_buffers&lt;/code&gt;&lt;/strong&gt;
Default 128MB. Recommended to configure at 25% of total memory. Because PostgreSQL&amp;rsquo;s private memory generally takes up a significant portion and relies on cache, sufficient memory must be left for the OS. It is therefore not recommended to set this to as high a value (relative to total memory) as you would for Oracle&amp;rsquo;s SGA.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;shared_memory_type&lt;/code&gt;&lt;/strong&gt;
Specifies the shared memory implementation method, not only for shared_buffers but also for other shared data areas.
The shared memory implementation varies by platform. (It appears) on Linux the default is &lt;code&gt;mmap&lt;/code&gt;. Other values are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;posix&lt;/code&gt; (for POSIX shared memory allocated using &lt;code&gt;shm_open&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sysv&lt;/code&gt; (for System V shared memory allocated via &lt;code&gt;shmget&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;windows&lt;/code&gt; (for Windows shared memory)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mmap&lt;/code&gt; (to simulate shared memory using memory-mapped files stored in the data directory)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By default, PostgreSQL uses a very small amount of System V shared memory, with the vast majority being mmap shared memory. Due to &lt;a href="https://postgreshelp.com/postgresql-dynamic-shared-memory-posix-vs-mmap/" target="_blank" rel="noreferrer"&gt;differences between POSIX and System V IPC&lt;/a&gt;, signal implementations differ. The &lt;code&gt;shared_memory_type&lt;/code&gt; parameter can be explicitly adjusted for the IPC implementation mechanism:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/kernel-resources.html#SYSVIPC" target="_blank" rel="noreferrer"&gt;Setting System V IPC&lt;/a&gt; (default is &lt;code&gt;mmap&lt;/code&gt;):
On Linux and FreeBSD systems, the default shared memory system settings are generally sufficient. Setting &lt;code&gt;shared_memory_type&lt;/code&gt; to &lt;code&gt;sysv&lt;/code&gt; does not take effect on these two platforms (System V semaphores are not used on this platform).
On OpenBSD systems, if &lt;code&gt;shared_memory_type&lt;/code&gt; is set to &lt;code&gt;sysv&lt;/code&gt;, the default shared memory system parameters are insufficient and need to be adjusted via sysctl.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting POSIX IPC:
POSIX semaphores are effective on Linux and FreeBSD.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;&lt;/strong&gt;
The mechanism for dynamic shared memory, defaults to &lt;code&gt;posix&lt;/code&gt;. This parameter is important for parallel queries. A &lt;a href="https://www.postgresql.org/message-id/CA%2BhUKGJOj7qzDLxeFPVvto8YEWop6FSQoTYPO9Z6Ee%3Di-nPS_Q%40mail.gmail.com" target="_blank" rel="noreferrer"&gt;community email about /dev/shm&lt;/a&gt; describes:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL creates segments in /dev/shm for parallel queries (via&lt;br&gt;
shm_open()), not for shared buffers. The amount used is controlled by&lt;br&gt;
work_mem. Queries can use up to work_mem for each node you see in the&lt;br&gt;
EXPLAIN plan, and for each process, so it can be quite a lot if you&lt;br&gt;
have lots of parallel worker processes and/or lots of&lt;br&gt;
tables/partitions being sorted or hashed in your query.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Translation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parallel queries use POSIX and create segments in &lt;code&gt;/dev/shm&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Parallel queries do NOT use &lt;code&gt;shared_buffers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Each plan node in a query is limited by &lt;code&gt;work_mem&lt;/code&gt;!&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;min_dynamic_shared_memory&lt;/code&gt;&lt;/strong&gt;
The initial size of memory used by parallel queries, allocated at server startup. Related to &lt;code&gt;huge_pages&lt;/code&gt; and &lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;huge_pages&lt;/code&gt;&lt;/strong&gt;
This parameter controls whether the &lt;strong&gt;main shared memory area&lt;/strong&gt; uses huge pages. This means private memory and OS-level memory are not affected by this setting. PostgreSQL&amp;rsquo;s use of huge pages is currently only supported on Linux and Windows systems; on Linux systems, it is only supported when &lt;code&gt;shared_memory_type&lt;/code&gt; is set to &lt;code&gt;mmap&lt;/code&gt;!&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Setting&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;try&lt;/td&gt;
 &lt;td&gt;default, attempts to allocate huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;on&lt;/td&gt;
 &lt;td&gt;uses huge pages; server will not start if allocation fails&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;off&lt;/td&gt;
 &lt;td&gt;does not use huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;huge_page_size&lt;/code&gt;&lt;/strong&gt;
Controls the size of huge pages. Default is 0, meaning PostgreSQL uses the huge page size provided by the operating system. Setting a non-default value is only supported on Linux.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The pg_shmem_allocations View
 &lt;div id="the-pg_shmem_allocations-view" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_shmem_allocations-view" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;pg_shmem_allocations&lt;/code&gt; is a view introduced in PG13 that allows viewing the allocation of major shared memory segments, including those from PostgreSQL itself and extensions.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sum&lt;/span&gt;(allocated_size)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; gb &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_shmem_allocations;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gb 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;7658920288085938&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_shmem_allocations &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; allocated_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------+------------+------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffer Blocks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38575360&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2415919104&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2415919104&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2729553280&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240300672&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240300672&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;anonymous&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240198528&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240198528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffer Descriptors &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19700992&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18874368&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18874368&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLOG Ctl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;171008&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16803472&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16803584&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Backend Activity Buffer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2707733248&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10680320&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10680320&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;NULL indicates unused memory, &lt;code&gt;anonymous&lt;/code&gt; indicates anonymous page allocations.
Most of the memory modules in the &lt;code&gt;pg_shmem_allocations&lt;/code&gt; view are difficult to understand. You can find them by searching the source code, but there is no intuitive explanation — it simply displays the data from the source code&amp;rsquo;s init memory module.&lt;/p&gt;
&lt;p&gt;Example: Buffer Blocks:
Searching the source code directly for &amp;ldquo;buffer blocks&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Initialize shared buffer pool
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Called only once, during shared memory initialization
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;InitBufferPool&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		foundBufs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundDescs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundIOCV,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundBufCkpt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Align descriptors to a cacheline boundary. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferDescriptors &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (BufferDescPadded &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer Descriptors&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(BufferDescPadded),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundDescs);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferBlocks &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer Blocks&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; (Size) BLCKSZ, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundBufs);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Align condition variables to cacheline boundary. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferIOCVArray &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (ConditionVariableMinimallyPadded &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer IO Condition Variables&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ConditionVariableMinimallyPadded),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundIOCV);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Checkpoint BufferIds are used to sort checkpoints in shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CkptBufferIds &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (CkptSortItem &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Checkpoint BufferIds&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(CkptSortItem), &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundBufCkpt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;InitBufferPool()&lt;/code&gt; function initializes the shared buffer.&lt;/li&gt;
&lt;li&gt;The shared buffer has 4 sub-pools: Buffer Descriptors, Buffer Blocks, Buffer IO Condition Variables, Checkpoint BufferIds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Private Memory
 &lt;div id="private-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#private-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Private memory is memory areas allocated by PostgreSQL for each session or process. Unlike shared buffers, there is not just one. Private memory of each process cannot be accessed by other processes.



&lt;img src="https://lastdba.com/img/csdn/b9b739d63ed8.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;temp_buffers&lt;/code&gt;&lt;/strong&gt;
Temp buffers are used to cache temporary table data, default 8MB. temp_buffers is private memory, so temporary tables are only visible to the current session.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;work_mem&lt;/code&gt;&lt;/strong&gt;
The maximum memory used by query operations, such as sorts and hash tables. Default 4MB.
&lt;em&gt;Each query or each plan node?&lt;/em&gt;
&lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM" target="_blank" rel="noreferrer"&gt;Official documentation&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Note that a complex query might perform several sort and hash operations at the same time, with each operation generally being allowed to use as much memory as this value specifies before it starts to write data into temporary files.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/CA%2BhUKGJOj7qzDLxeFPVvto8YEWop6FSQoTYPO9Z6Ee%3Di-nPS_Q%40mail.gmail.com" target="_blank" rel="noreferrer"&gt;Community email about /dev/shm&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Queries can use up to work_mem for each node you see in the&lt;br&gt;
EXPLAIN plan,&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;em&gt;This parameter applies to each operation (plan node) in a query, not to each query.&lt;/em&gt; A query can have many parallel operations, so a single query can also consume a lot of memory. Therefore, the &lt;code&gt;work_mem&lt;/code&gt; setting must be made very carefully to avoid exhausting OS memory. The worst case: multiple sessions, each session having multiple plan nodes, and those plan nodes using operations that heavily consume work_mem.
&lt;em&gt;Which operations use work_mem?&lt;/em&gt;
For sort operations: ORDER BY, DISTINCT, merge joins. For hash table usage: hash joins, hash-based aggregation, memoize nodes, hash-based IN subqueries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;hash_mem_multiplier&lt;/code&gt;&lt;/strong&gt;
Used to limit the memory size of hash-based operations. The limit is &lt;code&gt;hash_mem_multiplier&lt;/code&gt; * &lt;code&gt;work_mem&lt;/code&gt;. &lt;code&gt;hash_mem_multiplier&lt;/code&gt; defaults to 2.
Although work_mem can be limited, you cannot limit how many hash operations a query uses, so PG13 added this parameter. This means that before version 12 (inclusive), it was very difficult to limit hash table memory.
&lt;em&gt;In our 9.6 production environment, we found a single session consuming 300GB of memory. The culprit was the lack of hash table limits in older versions combined with an execution plan that incorrectly used hash tables.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt;&lt;/strong&gt;
Memory area used by operations such as &lt;code&gt;VACUUM&lt;/code&gt;, &lt;code&gt;CREATE INDEX&lt;/code&gt;, and &lt;code&gt;ALTER TABLE ADD FOREIGN KEY&lt;/code&gt;. These are session-initiated operations with independent processes that use private memory. These maintenance operations cannot run in parallel within a single session, and concurrency is generally low, so this parameter can be set relatively high.
Autovacuum may also use this memory area and limit. See &lt;code&gt;autovacuum_work_mem&lt;/code&gt; explanation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;autovacuum_work_mem&lt;/code&gt;&lt;/strong&gt;
Maximum memory used by each autovacuum worker process. Default -1, meaning the &lt;code&gt;maintenance_work_mem&lt;/code&gt; parameter is used to limit autovacuum workers. Vacuum uses at most 1GB of memory, and autovacuum has the same limit, so setting the vacuum/autovacuum memory limit above 1GB is meaningless.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;vacuum_buffer_usage_limit&lt;/code&gt;&lt;/strong&gt;
Limits the number of pages that &lt;code&gt;VACUUM&lt;/code&gt; and &lt;code&gt;ANALYZE&lt;/code&gt; can access from shared memory, to prevent too many pages from being evicted. Default is 256KB, 0 means no limit.
When using &lt;code&gt;VACUUM&lt;/code&gt; or &lt;code&gt;ANALYZE&lt;/code&gt; commands, &lt;code&gt;BUFFER_USAGE_LIMIT&lt;/code&gt; can be specified, which takes precedence over the GUC parameter &lt;code&gt;vacuum_buffer_usage_limit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;max_stack_depth&lt;/code&gt;&lt;/strong&gt;
The maximum safe depth of the execution stack, generally meaning the stack depth of a recursive function executed on a single backend process. Default is 2MB. The OS kernel stack limit should be set slightly larger than &lt;code&gt;max_stack_depth&lt;/code&gt;.
If a recursive function exceeds the stack depth, the following error is reported:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: stack depth limit exceeded HINT: 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Increase the configuration parameter max_stack_depth &lt;span style="color:#f92672"&gt;(&lt;/span&gt;currently 2048kB&lt;span style="color:#f92672"&gt;)&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;after ensuring the platform&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s stack depth limit is adequate.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;logical_decoding_work_mem&lt;/code&gt;&lt;/strong&gt;
Before PG13, logical decoding would retain at most 4096 changes in memory (&lt;code&gt;max_changes_in_memory&lt;/code&gt; hardcoded in the source). PG13 introduced the parameter &lt;code&gt;logical_decoding_work_mem&lt;/code&gt;. If the data held by logical decoding exceeds this memory value, it is written to disk. Default 64MB.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;each replication connection only uses a single buffer of this size,&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Generally, the number of logical replication connections is not large, so &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; can be set relatively high without issues.&lt;/p&gt;

&lt;h2 class="relative group"&gt;xxCache
 &lt;div id="xxcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xxcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;xxCache is also private memory.&lt;/strong&gt; For example, PostgreSQL caches relation metadata in relcache. The official documentation has relatively little description about this, but PostgreSQL memory problems are often related to it.
For instance, the issue of catalog cache causing each backend process to consume a lot of memory without releasing it has appeared in many environments. Here is a &lt;a href="https://www.postgresql.org/message-id/flat/20160708012833.1419.89062%40wrigleys.postgresql.org#20160708012833.1419.89062@wrigleys.postgresql.org" target="_blank" rel="noreferrer"&gt;community email from 2016 by Digoal about catalog cache consuming excessive memory&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Every PostgreSQL session holds system data in own cache. Usually this cache is pretty small (for significant numbers of users). But can be pretty big if your catalog is untypically big and you touch almost all objects from&lt;br&gt;
catalog in session. A implementation of this cache is simple - there is not&lt;br&gt;
delete or limits. There is not garabage collector (and issue related to&lt;br&gt;
GC), what is great, but the long sessions on big catalog can be problem.&lt;br&gt;
The solution is simple - close session over some time or over some number of operations. Then all memory in caches will be released.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The community&amp;rsquo;s explanation of catalog cache:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each session has its own cache for storing system data (metadata, etc.)&lt;/li&gt;
&lt;li&gt;Generally, this cache is small. When the catalog is large and a session has accessed all catalog objects, the cache can become very large.&lt;/li&gt;
&lt;li&gt;Cache management is simple: &lt;strong&gt;there is no deletion mechanism or limit&lt;/strong&gt; (though invalidation messages do exist).&lt;/li&gt;
&lt;li&gt;Closing the session releases the cache.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tom Lane&amp;rsquo;s solution was also simple and blunt — add more hardware resources:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I do not think you should complain if that takes a great deal of memory. Either rethink why you need so many tables, or buy hardware commensurate with the size of your problem.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In fact, there are many knowledge points about caches worth paying attention to. After understanding their principles, the solutions to cache-caused memory issues may not be limited to just one approach.
There are many types of xxCache, such as relcache, syscache, plancache, etc. Since documentation is scarce, understanding xxCache requires reading the source code. The main xxCache source code is under &lt;code&gt;src/backend/utils/cache&lt;/code&gt;.
&lt;em&gt;Source structure&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inval.c				&lt;span style="color:#f92672"&gt;--&lt;/span&gt; Invalidation message dispatcher &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; private caches. The corresponding shared cache invalidation message handler is sinval.c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relfilenodemap.c	&lt;span style="color:#f92672"&gt;--&lt;/span&gt; relfilenode to oid mapping cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ts_cache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; Cache &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;Tsearch&lt;/span&gt; (text search) related objects
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relmapper.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; catalog to relfilenode mapping cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;typcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; type cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spccache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; tablespace cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;evtcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; event trigger cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attoptcache.c		&lt;span style="color:#f92672"&gt;--&lt;/span&gt; attribute cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plancache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; plan cache 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; relation cache 							 								&lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;catcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; system catalog cache 					 						&lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;syscache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; one layer above catcache, also system catalog cache	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lsyscache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; routines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; conveniently querying catalog cache, &lt;span style="color:#e6db74"&gt;&amp;#39;l&amp;#39;&lt;/span&gt; likely stands &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; lookup
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;partcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; routines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; operating on partition information in relcache&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In addition to handling various caches, there is also source code for operations and messages. Below we focus on relcache, catcache/syscache, and invalidation messages.&lt;/p&gt;

&lt;h3 class="relative group"&gt;relcache
 &lt;div id="relcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#relcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What data does a relcache entry store?&lt;/strong&gt;
Defined in &lt;code&gt;src/include/utils/rel.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; POSTGRES relation &lt;span style="color:#a6e22e"&gt;descriptor&lt;/span&gt; (a&lt;span style="color:#f92672"&gt;/&lt;/span&gt;k&lt;span style="color:#f92672"&gt;/&lt;/span&gt;a relcache entry) definitions.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationData&lt;/code&gt; is the primary data structure for relcache entries:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; RelationData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelFileNode rd_node;		&lt;span style="color:#75715e"&gt;/* physical identifier of relation */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SMgrRelation rd_smgr;		&lt;span style="color:#75715e"&gt;/* cached file handle, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			rd_refcnt;		&lt;span style="color:#75715e"&gt;/* reference count */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BackendId	rd_backend;		&lt;span style="color:#75715e"&gt;/* if temp relation, the owning backend id */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_islocaltemp; &lt;span style="color:#75715e"&gt;/* is it a temp rel of the current session */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_isnailed;	&lt;span style="color:#75715e"&gt;/* is it nailed in cache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_isvalid;		&lt;span style="color:#75715e"&gt;/* is the relcache entry valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_indexvalid;	&lt;span style="color:#75715e"&gt;/* are the indexes on the relation valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_statvalid;	&lt;span style="color:#75715e"&gt;/* are the statistics on the relation valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* some subtransaction info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_createSubid;	&lt;span style="color:#75715e"&gt;/* rel was created in current xact */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_newRelfilenodeSubid;	&lt;span style="color:#75715e"&gt;/* highest subxact changing rd_node to current value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_firstRelfilenodeSubid;	&lt;span style="color:#75715e"&gt;/* highest subxact changing rd_node to any value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_droppedSubid;	&lt;span style="color:#75715e"&gt;/* dropped with another Subid set */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Form_pg_class rd_rel;		&lt;span style="color:#75715e"&gt;/* pointer to the relation&amp;#39;s pg_class tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TupleDesc	rd_att;			&lt;span style="color:#75715e"&gt;/* tuple descriptor */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_id;			&lt;span style="color:#75715e"&gt;/* relation&amp;#39;s oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	LockInfoData rd_lockInfo;	&lt;span style="color:#75715e"&gt;/* lock info on the relation */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RuleLock &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_rules;		&lt;span style="color:#75715e"&gt;/* rewrite rules */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_rulescxt;	&lt;span style="color:#75715e"&gt;/* private memory cxt for rd_rules */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TriggerDesc &lt;span style="color:#f92672"&gt;*&lt;/span&gt;trigdesc;		&lt;span style="color:#75715e"&gt;/* trigger info, NULL if none */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* foreign key info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_fkeylist;	&lt;span style="color:#75715e"&gt;/* list of ForeignKeyCacheInfo (see below) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_fkeyvalid;	&lt;span style="color:#75715e"&gt;/* true if list has been computed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* partition info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PartitionKey rd_partkey;	&lt;span style="color:#75715e"&gt;/* partition key, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_partkeycxt;	&lt;span style="color:#75715e"&gt;/* private context for rd_partkey, if any */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_indexlist;	&lt;span style="color:#75715e"&gt;/* list of all index OIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_pkindex;		&lt;span style="color:#75715e"&gt;/* primary key oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_replidindex; &lt;span style="color:#75715e"&gt;/* replica identity index oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_statlist;	&lt;span style="color:#75715e"&gt;/* list of extended stats OIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PublicationDesc &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_pubdesc;	&lt;span style="color:#75715e"&gt;/* publication descriptor, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	bytea	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_options;		&lt;span style="color:#75715e"&gt;/* parsed pg_class.reloptions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Form_pg_index rd_index;		&lt;span style="color:#75715e"&gt;/* index descriptor in pg_index tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_indextuple;	&lt;span style="color:#75715e"&gt;/* all pg_index tuples */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_indexcxt;	&lt;span style="color:#75715e"&gt;/* index cxt */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_amcache;		&lt;span style="color:#75715e"&gt;/* available for use by index/table AM */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; FdwRoutine &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_fdwroutine;	&lt;span style="color:#75715e"&gt;/* cached function pointers, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} RelationData;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationData&lt;/code&gt; contains a large amount of relation-related metadata: oid, pg_class, partition tables, subtransactions, row security policies, statistics, index metadata, AM, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;relcache ROUTINES&lt;/strong&gt;
The ROUTINES source code is located at &lt;code&gt;src/backend/utils/cache/relcache.c&lt;/code&gt;.
There are mainly 5 stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitialize&lt;/code&gt; - Initialize relcache, initially empty&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt; - Initialize shared catalogs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; - Complete relcache initialization&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationIdGetRelation&lt;/code&gt; - Get relation descriptor by relation id&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationClose&lt;/code&gt; - Close a relation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These 5 stages are the 5 main logical steps for a rel entry, equivalent to the lifecycle of a rel entry, not the lifecycle of relcache. The first three stages are all relcache initialization — they initialize relcache and load some system tables and their indexes. The last two stages are the logic for obtaining a reldesc and closing a relation; the relcache itself still exists.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 1&lt;/em&gt;: &lt;code&gt;RelationCacheInitialize&lt;/code&gt;
&lt;code&gt;RelationCacheInitialize&lt;/code&gt; initializes relcache:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Define initial size 400
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INITRELCACHESIZE		400
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitialize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASHCTL		ctl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			allocsize;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * make sure cache memory context exists
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Check if cache mctx exists, create one if not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;CacheMemoryContext)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateCacheMemoryContext&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Create hash table indexed by OID for relcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ctl.keysize &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(Oid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ctl.entrysize &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(RelIdCacheEnt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelationIdCache &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;hash_create&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Relcache by OID&amp;#34;&lt;/span&gt;, INITRELCACHESIZE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ctl, HASH_ELEM &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_BLOBS);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize relation mapper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitialize&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationCacheInitialize&lt;/code&gt; does not allocate any relation operations; it only initializes relcache memory, hash tables, mappers, etc.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 2&lt;/em&gt;: &lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitializePhase2&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext oldcxt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize relation mapper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitializePhase2&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If in bootstrap mode, shared catalogs don&amp;#39;t exist yet, so do nothing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Switch to current cache mctx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	oldcxt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Try to load shared relcache file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;load_relcache_init_file&lt;/span&gt;(true)) &lt;span style="color:#75715e"&gt;// If init file not loaded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_database&amp;#34;&lt;/span&gt;, DatabaseRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_database, Desc_pg_database);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_authid&amp;#34;&lt;/span&gt;, AuthIdRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_authid, Desc_pg_authid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_auth_members&amp;#34;&lt;/span&gt;, AuthMemRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_auth_members, Desc_pg_auth_members);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_shseclabel&amp;#34;&lt;/span&gt;, SharedSecLabelRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_shseclabel, Desc_pg_shseclabel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_subscription&amp;#34;&lt;/span&gt;, SubscriptionRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_subscription, Desc_pg_subscription);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_SHARED_RELS	5	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The init file is divided into shared and local cache init files. &lt;code&gt;load_relcache_init_file()&lt;/code&gt; attempts to load data from these two types of files into relcache (here it should only load the shared ones). If loading fails, it creates descriptors for the 5 basic system tables: &lt;code&gt;pg_database&lt;/code&gt;, &lt;code&gt;pg_authid&lt;/code&gt;, etc.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 3&lt;/em&gt;:
&lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; is the third stage of initialization and contains the most content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitializePhase3&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASH_SEQ_STATUS status;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelIdCacheEnt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;idhentry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext oldcxt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		needNewCacheFile &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalSharedRelcachesBuilt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitializePhase3&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Switch to CacheMemoryContext
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	oldcxt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Like stage 2, load more system table descriptors
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;() &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;load_relcache_init_file&lt;/span&gt;(false))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		needNewCacheFile &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_class&amp;#34;&lt;/span&gt;, RelationRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_class, Desc_pg_class);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_attribute&amp;#34;&lt;/span&gt;, AttributeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_attribute, Desc_pg_attribute);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_proc&amp;#34;&lt;/span&gt;, ProcedureRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_proc, Desc_pg_proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_type&amp;#34;&lt;/span&gt;, TypeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_type, Desc_pg_type);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_LOCAL_RELS 4	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If we haven&amp;#39;t obtained critical system indexes yet, do it now
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Because catcache and/or opclass cache depend on critical system indexes in relcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalRelcachesBuilt) &lt;span style="color:#75715e"&gt;// If critical indexes not loaded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(ClassOidIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							RelationRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(TriggerRelidNameIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							TriggerRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_LOCAL_INDEXES	7	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		criticalRelcachesBuilt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// Mark: critical system table indexes obtained
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Continue processing shared critical system table indexes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// These shared critical system tables are needed in certain situations (autovacuum, client authentication, etc.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalSharedRelcachesBuilt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(DatabaseNameIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							DatabaseRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(SharedSecLabelObjectIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							SharedSecLabelRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_SHARED_INDEXES 6	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		criticalSharedRelcachesBuilt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// Mark: shared critical system table indexes obtained
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Scan all entries in relcache and update those that are erroneous
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// from formrdesc or init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If erroneous, read pg_class data and replace the erroneous entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Because the cache file does not contain rules, triggers, security policies,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// also fetch from pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((idhentry &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (RelIdCacheEnt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;hash_seq_search&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;status)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Relation	relation &lt;span style="color:#f92672"&gt;=&lt;/span&gt; idhentry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;reldesc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Ensure relations in use are not flushed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If it&amp;#39;s an erroneous entry, read the tuple from pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relowner &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidOid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel, (&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) relp, CLASS_TUPLE_SIZE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Update rd_option
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_options)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;pfree&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_options);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationParseRelOptions&lt;/span&gt;(relation, htup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReleaseSysCache&lt;/span&gt;(htup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Fix data not in the init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// For example, relhasrules, relhastriggers may be outdated or incorrect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhasrules &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rules &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildRuleLock&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rules &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhasrules &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhastriggers &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trigdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildTriggers&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trigdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhastriggers &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Reload row security policies, since init file doesn&amp;#39;t contain them
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relrowsecurity &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rsdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildRowSecurity&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rsdesc &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If tableam needs reloading
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_tableam &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(&lt;span style="color:#a6e22e"&gt;RELKIND_HAS_TABLE_AM&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_SEQUENCE))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationInitTableAccessMethod&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_tableam &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Decrement reference count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationDecrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Finally, if needed, update the init file (since there may have been reloads, don&amp;#39;t waste them)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (needNewCacheFile)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;InitCatalogCachePhase2&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* now write the files */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(true); &lt;span style="color:#75715e"&gt;// Write global init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(false); &lt;span style="color:#75715e"&gt;// Write private init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to Stage 2 which loads 5 system tables, &lt;code&gt;RelationCacheInitializePhase3()&lt;/code&gt; loads more system tables, such as &lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_proc&lt;/code&gt;, and the indexes on these tables. Of course, the precondition for loading these rels is that they are not in cache or have expired. After reloading is complete, the &amp;ldquo;new&amp;rdquo; catalog is written to the init file.
Looking at the &lt;code&gt;write_relcache_init_file&lt;/code&gt; function source code when writing the init file, we can understand the meaning of the &lt;code&gt;true&lt;/code&gt; and &lt;code&gt;false&lt;/code&gt; parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; shared)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (shared)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(tempfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(tempfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;global/%s.%d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RELCACHE_INIT_FILENAME, MyProcPid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(finalfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(finalfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;global/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RELCACHE_INIT_FILENAME);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(tempfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(tempfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;%s/%s.%d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 DatabasePath, RELCACHE_INIT_FILENAME, MyProcPid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(finalfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(finalfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;%s/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 DatabasePath, RELCACHE_INIT_FILENAME);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;true&lt;/code&gt; means write to the global init file.
&lt;code&gt;false&lt;/code&gt; means write to the local init file.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;RELCACHE_INIT_FILENAME&lt;/code&gt; parameter macro definition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define RELCACHE_INIT_FILENAME &amp;#34;pg_internal.init&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the written init files are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shared: &lt;code&gt;global/pg_internal.init&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;local: &lt;code&gt;DatabasePath/pg_internal.init&lt;/code&gt; and &lt;code&gt;DatabasePath/pg_internal.init.myPID&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s look at real init file paths:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ find ./ -name *init*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./global/pg_internal.init &lt;span style="color:#75715e"&gt;#shared&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/1/pg_internal.init &lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/13577/pg_internal.init &lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/13578/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/16398/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/16811/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/17674/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Diagram of the three initialization stages call flow:



&lt;img src="https://lastdba.com/img/csdn/f743c6c69083.png" alt="Image" /&gt;
(&lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 4&lt;/em&gt;: &lt;code&gt;RelationIdGetRelation&lt;/code&gt;
Find a reldesc by OID. The caller only needs an AccessShareLock on the OID and is responsible for incrementing/decrementing the rel&amp;rsquo;s reference count.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Relation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationIdGetRelation&lt;/span&gt;(Oid relationId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Relation	rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Ensure we&amp;#39;re in a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;IsTransactionState&lt;/span&gt;());
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// First try to find in cache via reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationIdCacheLookup&lt;/span&gt;(relationId, rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationIsValid&lt;/span&gt;(rd))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Return NULL for dropped relations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_droppedSubid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidSubTransactionId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_isvalid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_isvalid) &lt;span style="color:#75715e"&gt;// If cached rel is invalid, revalidate it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_INDEX &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_PARTITIONED_INDEX) &lt;span style="color:#75715e"&gt;// Load index info directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;RelationReloadIndexInfo&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// For non-index, clear the reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;RelationClearRelation&lt;/span&gt;(rd, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No reldesc found, create a new one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	rd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RelationBuildDesc&lt;/span&gt;(relationId, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationIsValid&lt;/span&gt;(rd))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationIdGetRelation&lt;/code&gt; is relatively simple: it obtains a reldesc and index info via OID.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 5&lt;/em&gt;: &lt;code&gt;RelationClose&lt;/code&gt;
The code for &lt;code&gt;RelationClose&lt;/code&gt; is also quite simple:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationClose&lt;/span&gt;(Relation relation)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No lock operations needed, simply decrement refcount
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationDecrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If no sessions have the relation open, partition descriptors can be deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationHasReferenceCountZero&lt;/span&gt;(relation))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;firstchild &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MemoryContextDeleteChildren&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;firstchild &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MemoryContextDeleteChildren&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef RELCACHE_FORCE_RELEASE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationHasReferenceCountZero&lt;/span&gt;(relation) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_createSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidSubTransactionId &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_firstRelfilenodeSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidSubTransactionId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationClearRelation&lt;/span&gt;(relation, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationClose&lt;/code&gt; is the operation for closing access to a relation. Generally, this function only decrements the &lt;code&gt;refcount&lt;/code&gt; of sessions accessing the relation. However, there are exceptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When &lt;code&gt;refcount&lt;/code&gt; is 0, &lt;code&gt;MemoryContextDeleteChildren()&lt;/code&gt; is executed. This function deletes the mctx related to &lt;em&gt;child partition descriptors&lt;/em&gt;, which does release memory.&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;refcount&lt;/code&gt; is 0 and the macro &lt;code&gt;RELCACHE_FORCE_RELEASE&lt;/code&gt; is defined, the &lt;code&gt;RelationClearRelation()&lt;/code&gt; function deletes the hash table entry. This step does not release memory. The &lt;code&gt;RELCACHE_FORCE_RELEASE&lt;/code&gt; macro was not found (only available with explicit compilation?).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;relcache is not completely without memory release logic, but the trigger conditions are relatively strict, and the freed memory is not all of the relcache memory.&lt;/em&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;syscache/catcache
 &lt;div id="syscachecatcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#syscachecatcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CatCache caches tuples from system tables. Built on top of CatCache is another layer called SysCache (KV interface). Essentially, CatCache and SysCache together reorganize data from system tables in memory using a KV approach.
syscache/catcache is more complex. Here I&amp;rsquo;ll briefly extract some easily interpretable content, mainly to understand the cached content and loading mechanism of syscache. For deeper source code analysis, refer to &lt;a href="https://blog.csdn.net/weixin_45644897/article/details/121254012" target="_blank" rel="noreferrer"&gt;PostgreSQL Source Analysis — Storage Management — Memory Management (3)&lt;/a&gt; and &lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;PostgreSQL RelCache and SysCache Caches&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;catcache struct&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; catcache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			id;				&lt;span style="color:#75715e"&gt;// cache id, defined in syscache.h
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cc_nbuckets;	&lt;span style="color:#75715e"&gt;// number of hash buckets for this cache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TupleDesc	cc_tupdesc;		&lt;span style="color:#75715e"&gt;// tuple descriptor, copied from reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cc_relname;		&lt;span style="color:#75715e"&gt;// system table name corresponding to the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			cc_reloid;		&lt;span style="color:#75715e"&gt;// system table OID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			cc_indexoid;	&lt;span style="color:#75715e"&gt;// index OID for cache key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		cc_relisshared; &lt;span style="color:#75715e"&gt;// is the table shared across databases?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Statistics used by catcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef CATCACHE_STATS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_searches;	&lt;span style="color:#75715e"&gt;// number of queries against this catcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_hits;		&lt;span style="color:#75715e"&gt;// hit count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_neg_hits;	&lt;span style="color:#75715e"&gt;// negative entry hit count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} CatCache;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;catcache entry&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; catctup
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			ct_magic;		&lt;span style="color:#75715e"&gt;// identifies this catctup entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CT_MAGIC 0x57261502
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint32		hash_value;		&lt;span style="color:#75715e"&gt;// hash key value for this tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Dead tuples won&amp;#39;t be returned, but will be removed from catcache when refcount reaches zero
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			refcount;		&lt;span style="color:#75715e"&gt;// tuple refcount, indicates whether it&amp;#39;s being accessed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		dead;			&lt;span style="color:#75715e"&gt;// dead tuple, but not yet cleaned up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		negative;		&lt;span style="color:#75715e"&gt;// is this a negative cache entry?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTupleData tuple;		&lt;span style="color:#75715e"&gt;// tuple header structure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;my_cache;		&lt;span style="color:#75715e"&gt;// link to the catcache this tuple belongs to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} CatCTup;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;SearchCatCacheMiss() Function&lt;/strong&gt;
&lt;code&gt;SearchCatCacheMiss()&lt;/code&gt; is the main function for catcache hit/miss, and after a miss it accesses tuples from the dictionary.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; pg_noinline HeapTuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SearchCatCacheMiss&lt;/span&gt;(CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cache,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 uint32 hashValue,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Index hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v2,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v3,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ScanKeyData cur_skey[CATCACHE_MAXKEYS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Relation	relation;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SysScanDesc scandesc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTuple	ntp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CatCTup &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ct;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Datum		arguments[CATCACHE_MAXKEYS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple not found in cache, so try to find it directly from the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If found, add it to cache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If not found, add a negative cache entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	relation &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;table_open&lt;/span&gt;(cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_reloid, AccessShareLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	scandesc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;systable_beginscan&lt;/span&gt;(relation,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_indexoid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;IndexScanOK&lt;/span&gt;(cache, cur_skey),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 NULL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 cur_skey);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When tuple is valid, create an entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleIsValid&lt;/span&gt;(ntp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;systable_getnext&lt;/span&gt;(scandesc)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CatalogCacheCreateEntry&lt;/span&gt;(cache, ntp, arguments,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 hashValue, hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 false); &lt;span style="color:#75715e"&gt;// Create an entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Immediately increment refcount
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResourceOwnerEnlargeCatCacheRefs&lt;/span&gt;(CurrentResourceOwner);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;refcount&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResourceOwnerRememberCatCacheRef&lt;/span&gt;(CurrentResourceOwner, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;					&lt;span style="color:#75715e"&gt;/* assume only one match */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;systable_endscan&lt;/span&gt;(scandesc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;table_close&lt;/span&gt;(relation, AccessShareLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// If no tuple found, create a negative cache entry (a dummy tuple)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// The dummy tuple has key columns, all others are null
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// During startup, the invalidation mechanism is not active and entries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// cannot be cleaned up if a tuple is actually created later
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// So during this phase, negative entries are not created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ct &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL) &lt;span style="color:#75715e"&gt;// If no tuple found, enter the following logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;()) &lt;span style="color:#75715e"&gt;// Return NULL directly if in startup phase
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CatalogCacheCreateEntry&lt;/span&gt;(cache, NULL, arguments,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 hashValue, hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 true); &lt;span style="color:#75715e"&gt;// Create entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CACHE_elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;SearchCatCache(%s): Contains %d/%d tuples&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relname, cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_ntup, CacheHdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ch_ntup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CACHE_elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;SearchCatCache(%s): put neg entry in bucket %d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relname, hashIndex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Negative entries are not returned to caller, refcount remains 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The dummy tuple (&lt;em&gt;negative cache entry&lt;/em&gt;) here is brilliant — caching a non-existent tuple in catcache prevents needing to query the data dictionary again on the next access, avoiding repeated pointless data dictionary lookups.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Cache Validation Messages
 &lt;div id="cache-validation-messages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cache-validation-messages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When a tuple is updated or deleted, due to transaction visibility rules, these tuples that become invisible after the transaction ends need to be communicated to caches, invalidating the cached tuples so they can be reloaded on the next read. Similarly, when new tuples are inserted, negative cache entries in caches may also need to be flushed to match the new tuples. One common scenario is DDL — DDL may cause certain tuples in the metadata to become invalid, at which point cache validation messages need to be sent to various private caches to clean up cache entries.
This cache validation mechanism applies to managing private cache pools like syscache and relcache. Since idle backends won&amp;rsquo;t read sinval events, messages must be actively sent to allow lagging backends to &amp;ldquo;catch up.&amp;rdquo; When completing a transaction, invalidation events must be broadcast to other backends via the SI message queue.&lt;/p&gt;
&lt;p&gt;The source code is split into two parts: &lt;code&gt;sinval&lt;/code&gt; and &lt;code&gt;inval&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Invalidation interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/utils/inval.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/utils/inval.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation dispatch: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/utils/cache/inval.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/utils/cache/inval.c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/storage/sinval.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/storage/sinval.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing dispatch: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/storage/ipc/sinval.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/storage/ipc/sinval.c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing data structures interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/storage/sinvaladt.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/storage/sinvaladt.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing data structures: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/storage/ipc/sinvaladt.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/storage/ipc/sinvaladt.c&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In &lt;code&gt;src/backend/utils/cache/inval.c&lt;/code&gt;, the shared-invalidation message structure is defined:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	int8		id;				&lt;span style="color:#75715e"&gt;/* type field --- must be first */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalCatcacheMsg cc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalCatalogMsg cat;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalRelcacheMsg rc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalSmgrMsg sm;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalRelmapMsg rm;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalSnapshotMsg sn;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SharedInvalidationMessage;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Shared-invalidation messages include the following types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Invalidate a specific catcache entry&lt;/li&gt;
&lt;li&gt;Invalidate the entire catcache entry for a particular system catalog&lt;/li&gt;
&lt;li&gt;Invalidate a specific relcache entry&lt;/li&gt;
&lt;li&gt;Invalidate ALL relcache entries&lt;/li&gt;
&lt;li&gt;Invalidate the smgr cache entry for a particular physical relation&lt;/li&gt;
&lt;li&gt;Invalidate a mapped-relation&lt;/li&gt;
&lt;li&gt;Invalidate saved snapshots that scanned a relation&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;Messages are located in the shared memory queue until all other processes read them. Normally, receiving processes only read messages at specific times, so if a receiving process is idle (not processing any user requests) or busy doing other things such that they don&amp;rsquo;t have time to read these messages, the messages may remain in shared memory indefinitely. In unfortunate situations, if this shared memory space is no longer available for processes to store new messages, that process will have to take on the cleanup task. (In practice, this cleanup is done proactively, so space rarely runs out.) To discard old messages, it must be ensured that all other processes have read them. If some processes cannot do so for the above reasons, it must explicitly signal the lagging processes to catch up. Once the lagging processes have caught up, these messages can be freely discarded.
When processing a message, it first checks whether the catalog tuple specified in the message is currently in the cache (the message also specifies the syscache identifier). If so, it is removed from the cache&amp;rsquo;s hash table. The next time that tuple is requested, it will be re-read from the underlying catalog table and added to the hash table, so subsequent accesses will read the new value. If a process has already locked a particular database object preventing concurrent processes from modifying it, it can continue using the cached tuple until the lock is released.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;xxCache Issues Summary
 &lt;div id="xxcache-issues-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xxcache-issues-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are many types of xxCache, among which the more notable ones are plancache, relcache, and syscache. These caches belong to private memory and exist in each backend process. These caches have no LRU mechanism to evict stale data; they use invalidation messages to clean up globally-unneeded snapshots and metadata information, such as when an object is deleted.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relcache is the place most likely to occupy significant memory. relcache loads metadata information, and during initialization it reads *.init files to speed up loading metadata into relcache. Later, when other metadata needs to be read, loading also occurs.&lt;/li&gt;
&lt;li&gt;catcache caches tuple information from the data dictionary. syscache is one layer above catcache — they can be understood as jointly implementing this data dictionary cache. If a tuple does not exist, a negative entry is created to avoid accessing the data dictionary again on the next visit. Similarly, a catcache miss will also read tuples from the data dictionary.&lt;/li&gt;
&lt;li&gt;Cache validation messages exist to inform caches that cached tuples and snapshot information have become stale. They can invalidate corresponding relcache and catcache entries. Entries are removed from the cache&amp;rsquo;s hash table, which releases memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since the cache memory release mechanisms are very limited, when there is a lot of metadata (many tables, partition tables), relcache and catcache can consume a lot of memory — and this can happen for every backend.
&lt;em&gt;Possible solutions&lt;/em&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Global cache. Like Oracle&amp;rsquo;s dictionary cache, cache in one place with shared access. For example, &lt;a href="https://www.alibabacloud.com/help/en/polardb/polardb-for-postgresql/global-relcache-1" target="_blank" rel="noreferrer"&gt;PolarDB&amp;rsquo;s Global RelCache&lt;/a&gt; has already implemented this functionality.&lt;/li&gt;
&lt;li&gt;LRU. An LRU mechanism suitable for caches is needed to separate hot and cold ends, cleaning excessively old cache entries from the hash table. This might require cache limit parameters to restrict cache size — ideally one per cache&amp;hellip;&lt;/li&gt;
&lt;li&gt;Threading mode. Memory is shared and accessed by all threads — a natural advantage.&lt;/li&gt;
&lt;li&gt;Periodically disconnect long connections. All of the above are just wishful thinking.&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t create too many tables or partitions (note that in PostgreSQL, partitions are also tables).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Memory Contexts
 &lt;div id="memory-contexts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-contexts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL manages memory through the memory context mechanism. I previously did a &lt;a href="https://blog.csdn.net/qq_40687433/article/details/134796339?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;translation about memory contexts&lt;/a&gt;, roughly summarized as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;C language requires explicit memory deallocation. To reduce the risk of memory leaks, PostgreSQL implemented memory contexts to manage private memory.&lt;/li&gt;
&lt;li&gt;Memory contexts do not require freeing memory after each use; instead, memory is released by deleting a particular context.&lt;/li&gt;
&lt;li&gt;Memory contexts form a hierarchical structure — releasing a parent context recursively deletes all child contexts.&lt;/li&gt;
&lt;li&gt;Aside from debugging, observing memory context usage is quite difficult. Starting from PG14, the &lt;code&gt;pg_backend_memory_contexts&lt;/code&gt; view can observe the current memory context usage of the current session.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Timing of memory context creation during SQL operations:



&lt;img src="https://lastdba.com/img/csdn/b269e3547cbf.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In PostgreSQL, all memory allocation, deallocation, and resetting is done within memory contexts, so the &lt;code&gt;malloc()&lt;/code&gt;, &lt;code&gt;realloc()&lt;/code&gt;, and &lt;code&gt;free()&lt;/code&gt; system call functions are not used directly. Instead, &lt;code&gt;palloc()&lt;/code&gt;, &lt;code&gt;repalloc()&lt;/code&gt;, and &lt;code&gt;pfree()&lt;/code&gt; are used for memory allocation, reallocation, and deallocation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;C Library Memory Functions&lt;/strong&gt;
&lt;a href="https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/" target="_blank" rel="noreferrer"&gt;C library dynamic memory allocation functions&lt;/a&gt; include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;malloc(): The C library&amp;rsquo;s malloc() function (memory allocation) is used to allocate large blocks of memory.&lt;/li&gt;
&lt;li&gt;calloc(): The C library&amp;rsquo;s calloc() function (contiguous allocation) is used to allocate contiguous memory.&lt;/li&gt;
&lt;li&gt;free(): Used to release memory. malloc() and calloc() do not release memory; after dynamic memory allocation, free() must be used to release it.&lt;/li&gt;
&lt;li&gt;realloc(): Used for memory re-allocation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There is also a C library function &lt;a href="https://www.geeksforgeeks.org/memset-c-example/" target="_blank" rel="noreferrer"&gt;memset()&lt;/a&gt;, used to fill a memory block with a specific value.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL-Defined Memory Functions&lt;/strong&gt;
The functions actually heavily used in PostgreSQL source code for memory allocation, deallocation, etc., are &lt;code&gt;palloc()&lt;/code&gt;, &lt;code&gt;palloc0()&lt;/code&gt;, &lt;code&gt;repalloc()&lt;/code&gt;, and &lt;code&gt;pfree()&lt;/code&gt;. They mostly do not directly interact with OS memory (C library functions); only in certain cases do they call C library memory functions. This essentially adds a layer of protection over OS memory operations, with PostgreSQL handling small memory operations on its own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;palloc()&lt;/strong&gt;:
&lt;code&gt;palloc()&lt;/code&gt; primarily calls the &lt;code&gt;alloc&lt;/code&gt; method of &lt;code&gt;MemoryContext&lt;/code&gt;. &lt;code&gt;alloc&lt;/code&gt; corresponds to calling the &lt;code&gt;MemoryContextAlloc&lt;/code&gt; function, which in turn calls the &lt;code&gt;AllocSetAlloc&lt;/code&gt; function specified in the methods field of the current memory context.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;palloc&lt;/span&gt;(Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* duplicates MemoryContextAlloc to avoid increased overhead */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CurrentMemoryContext;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;alloc&lt;/span&gt;(context, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;palloc0()&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;palloc0&lt;/span&gt;(Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;alloc&lt;/span&gt;(context, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemSetAligned&lt;/span&gt;(ret, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;MemSetAligned&lt;/code&gt; is macro-defined and actually calls C library &lt;code&gt;memset&lt;/code&gt; for memory filling, but &lt;code&gt;MemSetAligned&lt;/code&gt; passes &lt;code&gt;0&lt;/code&gt; as the value.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MemSetAligned(start, val, len)\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	memset(_start, _val, _len); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...	&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to &lt;code&gt;palloc&lt;/code&gt;, &lt;code&gt;palloc0&lt;/code&gt; not only calls &lt;code&gt;alloc(context, size)&lt;/code&gt; but also zeroes out the memory content.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;repalloc()&lt;/strong&gt;:
&lt;code&gt;repalloc()&lt;/code&gt; primarily calls the &lt;code&gt;realloc&lt;/code&gt; method of &lt;code&gt;MemoryContext&lt;/code&gt;. The &lt;code&gt;realloc&lt;/code&gt; function pointer corresponds to the &lt;code&gt;AllocSetRealloc&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * repalloc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Adjust the size of a previously allocated chunk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;repalloc&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pointer, Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetMemoryChunkContext&lt;/span&gt;(pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;realloc&lt;/span&gt;(context, pointer, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;pfree()&lt;/strong&gt;:
pfree calls the &lt;code&gt;free_p&lt;/code&gt; function pointer in the methods field of the memory context to which the memory chunk belongs, to release the memory chunk&amp;rsquo;s space. Currently, in PostgreSQL, the &lt;code&gt;free_p&lt;/code&gt; pointer actually points to the &lt;code&gt;AllocSetFree&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * pfree
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Release an allocated chunk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pfree&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pointer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetMemoryChunkContext&lt;/span&gt;(pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;free_p&lt;/span&gt;(context, pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;VALGRIND_MEMPOOL_FREE&lt;/span&gt;(context, pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;AllocSetAlloc Memory Allocation&lt;/strong&gt;
Looking at the alloc method within, alloc ultimately points to the &lt;code&gt;AllocSetAlloc&lt;/code&gt; function. &lt;code&gt;AllocSetAlloc&lt;/code&gt; looks rather complex, but it becomes easier to understand when read in segments:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;AllocSetAlloc&lt;/span&gt;(MemoryContext context, Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocSet	set &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocSet) context;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocBlock	block;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocChunk	chunk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			fidx;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		chunk_size;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		blksize;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If requested memory exceeds the max chunk size, allocate an entire memory block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (size &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;allocChunkLimit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocBlock) &lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(blksize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If requested memory is less than chunk size, check free list for available free chunks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fidx &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocSetFreeIndex&lt;/span&gt;(size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	chunk &lt;span style="color:#f92672"&gt;=&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freelist[fidx];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (chunk &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL) &lt;span style="color:#75715e"&gt;// There are chunks available in the free list
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freelist[fidx] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocChunk) chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;aset;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;aset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) set;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocChunkGetPointer&lt;/span&gt;(chunk);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If there&amp;#39;s space, try to place the chunk in the allocation block; if not, create a new block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; ((block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;blocks) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Size		availspace &lt;span style="color:#f92672"&gt;=&lt;/span&gt; block&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;endptr &lt;span style="color:#f92672"&gt;-&lt;/span&gt; block&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freeptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (availspace &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; (chunk_size &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_CHUNKHDRSZ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No space, create a new block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (block &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Size		required_size;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Requested block size is a power of 2, not exceeding maxBlockSize
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		required_size &lt;span style="color:#f92672"&gt;=&lt;/span&gt; chunk_size &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_BLOCKHDRSZ &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_CHUNKHDRSZ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (blksize &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; required_size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			blksize &lt;span style="color:#f92672"&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Use malloc to allocate the block, size is a power of 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocBlock) &lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(blksize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8731cbdd1398.png" alt="Alt text" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://smartkeyerror.com/PostgreSQL-MemoryContext" target="_blank" rel="noreferrer"&gt;https://smartkeyerror.com/PostgreSQL-MemoryContext&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;palloc() =&amp;gt; AllocSetAlloc()&lt;/code&gt; only calls &lt;code&gt;malloc()&lt;/code&gt; to request memory from the OS when the requested memory exceeds the chunk size limit or when there are no free blocks in the freelist. In all other cases, it takes existing free chunks from the freelist.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pfree()&lt;/code&gt; is similar (not demonstrated here):
&lt;code&gt;pfree() =&amp;gt; AllocSetFree()&lt;/code&gt; releases a specified memory chunk in a memory context. If the chunk to be freed is the only chunk in the memory block, &lt;code&gt;free()&lt;/code&gt; is called directly to release that memory block. Otherwise, the specified chunk is added to the freelist for the next allocation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Viewing Memory Context Size
 &lt;div id="viewing-memory-context-size" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-memory-context-size" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;PG14+: &lt;code&gt;pg_backend_memory_contexts&lt;/code&gt; view to directly inspect memory context memory within the database.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_backend_memory_contexts &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; used_bytes &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ident &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; total_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; total_nblocks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; free_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; free_chunks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; used_bytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; CacheMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1048576&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;508216&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;540360&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Timezones &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;104120&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;101504&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97680&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12904&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ExecutorState &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PortalContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49208&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4424&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44784&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WAL record construction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49768&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6360&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43408&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;PG14+: &lt;code&gt;pg_log_backend_memory_contexts&lt;/code&gt; function outputs memory information to the log file, producing output similar to &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt; log output.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_log_backend_memory_contexts(&lt;span style="color:#ae81ff"&gt;9293&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Universal — gdb &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Use gdb to call &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gdb 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; attach &lt;span style="color:#ae81ff"&gt;9293&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; p MemoryContextStats&lt;span style="color:#f92672"&gt;(&lt;/span&gt;TopMemoryContext&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; void&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TopMemoryContext: &lt;span style="color:#ae81ff"&gt;97680&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;16856&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;80824&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TableSpace cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2088&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6104&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RowDescriptionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6888&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1304&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MessageContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6888&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1304&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Operator class cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;552&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;7640&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Relcache by OID: &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;3504&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;12880&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; CacheMemoryContext: &lt;span style="color:#ae81ff"&gt;524288&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;90840&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;433448&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;904&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1144&lt;/span&gt; used: pg_statistic_ext_relid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1224&lt;/span&gt; used: pg_database_oid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1224&lt;/span&gt; used: pg_authid_rolname_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WAL record construction: &lt;span style="color:#ae81ff"&gt;49768&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6360&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;43408&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PrivateRefCount: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;5576&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MdSmgr: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7592&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LOCALLOCK hash: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;552&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;7640&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Timezones: &lt;span style="color:#ae81ff"&gt;104120&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;101504&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ErrorContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7928&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; used&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/cac547b38cb3.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;references
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;src/backend/utils/mmgr/mcxt.c&lt;/p&gt;
&lt;p&gt;src/backend/utils/mmgr/README&lt;/p&gt;
&lt;p&gt;&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql02.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.htm" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/runtime-config-resource.htm&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/kernel-resources.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/kernel-resources.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/weixin_45644897/article/details/121340327" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_45644897/article/details/121340327&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://help.aliyun.com/zh/polardb/polardb-for-postgresql/global-cache" target="_blank" rel="noreferrer"&gt;https://help.aliyun.com/zh/polardb/polardb-for-postgresql/global-cache&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_cache02.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_cache02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://amitlan.com/2019/06/14/caches-inval.html" target="_blank" rel="noreferrer"&gt;https://amitlan.com/2019/06/14/caches-inval.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/memory-context-for-postgresql-memory-management/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/memory-context-for-postgresql-memory-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/" target="_blank" rel="noreferrer"&gt;https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr01.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr01.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr02.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://smartkeyerror.com/PostgreSQL-MemoryContext" target="_blank" rel="noreferrer"&gt;https://smartkeyerror.com/PostgreSQL-MemoryContext&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jnidzwetzki.github.io/2022/05/28/postgres-memory-context.html" target="_blank" rel="noreferrer"&gt;https://jnidzwetzki.github.io/2022/05/28/postgres-memory-context.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Deep Dive into PostgreSQL Transactions</title><link>https://lastdba.com/en/2024/08/12/a-deep-dive-into-postgresql-transactions/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-deep-dive-into-postgresql-transactions/</guid><description>&lt;p&gt;&lt;strong&gt;PostgreSQL Transactions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To guarantee ACID properties, an RDBMS must implement concurrency control. PostgreSQL, like Oracle and MySQL (InnoDB), uses MVCC (Multi-Version Concurrency Control) for concurrency control. MVCC works by continuously generating new versions of objects as data changes while allowing queries to access a bounded range of older versions. It captures a snapshot of data at a given point in time and selects one version to read.&lt;/p&gt;
&lt;p&gt;Oracle and MySQL both use undo segments to record old versions of objects. PostgreSQL has no undo. Instead, during DML operations it writes historical data directly into the original table (UPDATE creates a new row, DELETE marks the row) and records additional columns — xmin and xmax — in the table to store transaction IDs. By comparing transaction IDs and other metadata, PostgreSQL implements its MVCC mechanism.&lt;/p&gt;</description><content:encoded>&lt;p&gt;&lt;strong&gt;PostgreSQL Transactions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To guarantee ACID properties, an RDBMS must implement concurrency control. PostgreSQL, like Oracle and MySQL (InnoDB), uses MVCC (Multi-Version Concurrency Control) for concurrency control. MVCC works by continuously generating new versions of objects as data changes while allowing queries to access a bounded range of older versions. It captures a snapshot of data at a given point in time and selects one version to read.&lt;/p&gt;
&lt;p&gt;Oracle and MySQL both use undo segments to record old versions of objects. PostgreSQL has no undo. Instead, during DML operations it writes historical data directly into the original table (UPDATE creates a new row, DELETE marks the row) and records additional columns — xmin and xmax — in the table to store transaction IDs. By comparing transaction IDs and other metadata, PostgreSQL implements its MVCC mechanism.&lt;/p&gt;
&lt;p&gt;Among relational databases, PostgreSQL&amp;rsquo;s transaction mechanism is truly distinctive. Understanding it is key to grasping how PostgreSQL operates under the hood.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction Isolation Levels
 &lt;div id="transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Most relational databases support multiple transaction isolation levels. Under different isolation levels, concurrent transaction behavior varies.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Setting the Transaction Isolation Level
 &lt;div id="setting-the-transaction-isolation-level" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#setting-the-transaction-isolation-level" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL supports four isolation levels (though only three are actually effective):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SERIALIZABLE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPEATABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMITTED&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNCOMMITTED&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Isolation level parameters&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;default_transaction_isolation&lt;/code&gt;: sets the default isolation level for all transactions globally.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;transaction_isolation&lt;/code&gt;: displays the isolation level of the current session.&lt;/p&gt;
&lt;p&gt;The default isolation level is &lt;code&gt;read committed&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Changing the global default isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Modify the &lt;code&gt;default_transaction_isolation&lt;/code&gt; parameter and &lt;code&gt;reload&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;postgres=# alter system set default_transaction_isolation to &amp;#39;serializable&amp;#39;;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
 (1 row)
 postgres=# show transaction_isolation;
 transaction_isolation 
-----------------------
 serializable&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After the change, every new transaction will use the &lt;code&gt;default_transaction_isolation&lt;/code&gt; isolation level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Setting the session isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Note: &lt;code&gt;transaction_isolation&lt;/code&gt; only displays the current session&amp;rsquo;s isolation level. This parameter cannot be modified directly.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# alter system set transaction_isolation to &amp;#39;REPEATABLE READ&amp;#39;;
ERROR: parameter &amp;#34;transaction_isolation&amp;#34; cannot be changed&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Use &lt;code&gt;SET SESSION&lt;/code&gt; to change the session&amp;rsquo;s isolation level:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET
lzldb=# show transaction_isolation ;
-[ RECORD 1 ]---------+----------------
transaction_isolation | repeatable read&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Setting the transaction-level isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL allows specifying the isolation level for an individual transaction. You can set it when starting the transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
lzldb=# start TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or use &lt;code&gt;set transaction&lt;/code&gt; after starting a transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# begin;
BEGIN
lzldb=*# set transaction ISOLATION LEVEL REPEATABLE READ;
SET&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;ANSI-92 Transaction Isolation Levels
 &lt;div id="ansi-92-transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ansi-92-transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;em&gt;ANSI SQL-92&lt;/em&gt; standard defines four isolation levels:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Serializable&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;All transactions in the system execute serially, without interfering with each other. Executing transactions one after another avoids all data inconsistency scenarios.&lt;/p&gt;
&lt;p&gt;Early implementations used exclusive locks to control concurrent transactions. Serial execution caused queuing and dramatically reduced system concurrency. After ANSI-92, more serializable implementation methods emerged, greatly improving both concurrency and performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repeatable Read&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once a transaction begins, all data read during the transaction cannot be modified by other transactions. Repeatable Read is MySQL&amp;rsquo;s default isolation level.&lt;/p&gt;
&lt;p&gt;Note: in ANSI SQL, Repeatable Read can experience phantom reads, but PostgreSQL&amp;rsquo;s Repeatable Read does not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Read Committed&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A transaction can read data committed by other transactions. If a transaction reads a piece of data multiple times and that data happens to be modified and committed by another transaction in between, the current transaction will see different values for the same data. This is the default isolation level for both Oracle and PostgreSQL.&lt;/p&gt;
&lt;p&gt;At this isolation level, both &amp;ldquo;non-repeatable read&amp;rdquo; and &amp;ldquo;phantom read&amp;rdquo; scenarios can occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Read Uncommitted&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A transaction can read data that has been modified but not yet committed by other transactions. Since uncommitted data can still be rolled back, reading such data leads to &amp;ldquo;dirty reads.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;At this isolation level, &amp;ldquo;dirty read&amp;rdquo; scenarios can occur.&lt;/p&gt;
&lt;p&gt;PostgreSQL does not have a Read Uncommitted isolation level. Setting Read Uncommitted is treated as Read Committed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Standard concurrency phenomena and isolation level matrix&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Isolation Level&lt;/th&gt;
 &lt;th&gt;Dirty Read&lt;/th&gt;
 &lt;th&gt;Non-repeatable Read&lt;/th&gt;
 &lt;th&gt;Phantom Read&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Uncommitted&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Committed&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Repeatable Read&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Serializable&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL concurrency phenomena and isolation level matrix&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Isolation Level&lt;/th&gt;
 &lt;th&gt;Dirty Read&lt;/th&gt;
 &lt;th&gt;Non-repeatable Read&lt;/th&gt;
 &lt;th&gt;Phantom Read&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Uncommitted&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Committed&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Repeatable Read&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Serializable&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;A Brief History of Transaction Isolation Levels
 &lt;div id="a-brief-history-of-transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-history-of-transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The isolation levels and anomaly phenomena defined by ANSI SQL-92 have had a profound impact on the database industry. Even today, over 30 years later, most engineers&amp;rsquo; understanding of transaction isolation levels still revolves around them, and many real-world database isolation level implementations still follow them. However, the post-ANSI-92 era has seen much discussion and even criticism regarding isolation levels. Here is a summary of the key historical developments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1992&lt;/strong&gt;: The database industry was in a chaotic state regarding transactions, so ANSI defined the SQL-92 standard — the widely known 4 isolation levels and 4 anomaly phenomena.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1995&lt;/strong&gt;: Snapshot Isolation and other isolation levels were proposed, along with more anomaly phenomena. Microsoft engineers proposed the Snapshot Isolation level and criticized ANSI SQL-92, noting that the standard was vaguely defined and many isolation levels and anomalies were left undefined. See &lt;a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf" target="_blank" rel="noreferrer"&gt;&lt;em&gt;A Critique of ANSI SQL Isolation Levels&lt;/em&gt;&lt;/a&gt;. By this point, there were more than 4 isolation levels and more anomaly phenomena, including write skew.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1999&lt;/strong&gt;: Due to the proliferation of lock-based isolation levels, &lt;a href="http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-786.pdf" target="_blank" rel="noreferrer"&gt;Atul Adya&amp;rsquo;s paper&lt;/a&gt; organized these phenomena and mapped the various isolation levels back to ANSI SQL-92 based on anomaly phenomena and functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2005&lt;/strong&gt;: Because most databases claimed to be serializable but were actually Snapshot Isolation, Alan Fekete et al proposed &lt;a href="https://pdfs.semanticscholar.org/d658/2728e30011adfe27b329c35203dfb8d1e7a8.pdf" target="_blank" rel="noreferrer"&gt;&lt;em&gt;Making Snapshot Isolation Serializable&lt;/em&gt;&lt;/a&gt; — achieving serializability on top of Snapshot Isolation by eliminating its anomalies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2008&lt;/strong&gt;: Fekete extended serializability and proposed a database-level implementation called &lt;a href="https://cs.nyu.edu/courses/fall09/G22.2434-001/p729-cahill.pdf" target="_blank" rel="noreferrer"&gt;Serializable Snapshot Isolation (SSI)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2012&lt;/strong&gt;: PostgreSQL became the first database to implement SSI. See the &lt;a href="https://drkp.net/papers/ssi-vldb12.pdf" target="_blank" rel="noreferrer"&gt;PostgreSQL SSI implementation paper&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Isolation levels and anomaly phenomena from the 1995 &lt;em&gt;Critique of ANSI SQL Isolation Levels&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b45dce972611.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Isolation Levels Supported by Various Databases
 &lt;div id="isolation-levels-supported-by-various-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#isolation-levels-supported-by-various-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Many databases claim &amp;ldquo;full ACID&amp;rdquo; compliance, but without serializability, ACID cannot be fully realized (especially consistency). Yet many databases claim ACID support even without serializability. The truth is, most do not fully implement it — including the veteran Oracle.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/588a66bd74bb.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Serializable
 &lt;div id="serializable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#serializable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are many misconceptions about serializability.&lt;/p&gt;
&lt;p&gt;The meaning of serializable: if each transaction is itself correct (satisfying certain integrity conditions), then any schedule that executes those transactions serially is also correct (the transactions still satisfy their conditions). &amp;ldquo;Serial&amp;rdquo; means transactions do not overlap in time and cannot interfere with each other — they are fully isolated.&lt;/p&gt;
&lt;p&gt;In the 1970s, serializability was achieved through Strict Two-Phase Locking (SS2PL), where reads and writes block each other until the transaction ends. SS2PL sacrifices high availability but eliminates anomaly phenomena.&lt;/p&gt;
&lt;p&gt;Beyond SS2PL, there are other ways to achieve serializability, such as Serializable Snapshot Isolation (SSI).&lt;/p&gt;
&lt;p&gt;To guarantee no anomalies, serializability sacrifices some concurrency (how much depends on the implementation), but it can truly guarantee data consistency (the &amp;ldquo;C&amp;rdquo; in ACID). In other words, databases that do not implement serializability do not fully support ACID.&lt;/p&gt;
&lt;p&gt;Serializability has been mathematically proven achievable, but the real database world is somewhat &amp;ldquo;abnormal.&amp;rdquo; In practice, serializability is the highest transaction isolation level and the one strongly recommended by academics and experts. However, the vast majority of databases run at Read Committed or Snapshot Isolation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Do Weaker Isolation Levels Cause Academic Problems but Few Real-World Disasters?
 &lt;div id="why-do-weaker-isolation-levels-cause-academic-problems-but-few-real-world-disasters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-do-weaker-isolation-levels-cause-academic-problems-but-few-real-world-disasters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Anomalies in non-serializable isolation levels generally require high concurrency. Low-concurrency databases rarely encounter problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When anomalies do occur, some applications may not detect them or may not consider them important.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is possible that data becomes anomalous but the application simply returns an error and enters exception-handling logic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Cost is too high. Not only is the development cost of serializable isolation high for the database, but applications also need to adapt. Simply understanding this complex theory is no easy task.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Higher isolation levels lose some performance. Extensive rework may not be worth it; applications must choose between &amp;ldquo;high concurrency&amp;rdquo; and &amp;ldquo;freedom from anomalies.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Business logic is built around mechanisms, not rules. Applications have somewhat adapted to the anomalies of weaker isolation levels, especially Read Committed or Snapshot Isolation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Snapshot Isolation
 &lt;div id="snapshot-isolation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-isolation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ANSI SQL-92 did not define Snapshot Isolation (SI). This isolation level emerged as the database industry evolved.&lt;/p&gt;
&lt;p&gt;Quoting the Wikipedia definition: a transaction executing under Snapshot Isolation operates on a snapshot of the database taken at the start of the transaction. When the transaction ends, it will only commit successfully if the values it updated have not been externally changed since the snapshot was taken. Write conflicts thus cause transaction aborts.&lt;/p&gt;
&lt;p&gt;As the name implies, Snapshot Isolation uses snapshots. It exists in databases that use MVCC, where the multi-version concurrency mechanism supports concurrent transaction execution.&lt;/p&gt;
&lt;p&gt;The 1992 ANSI SQL-92 standard was defined based on database locks, so it did not define Snapshot Isolation. The concept only emerged with the 1995 &lt;em&gt;Critique&lt;/em&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Serializable Snapshot Isolation
 &lt;div id="serializable-snapshot-isolation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#serializable-snapshot-isolation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to the widespread adoption of Snapshot Isolation and the academic goal that databases should achieve serializability, Serializable Snapshot Isolation (SSI) was born. As the name suggests, it achieves serializability on top of Snapshot Isolation.&lt;/p&gt;
&lt;p&gt;Because of the ambiguity of the ANSI-92 standard, although Snapshot Isolation was not defined, many databases actually use it. Snapshot Isolation also has certain anomaly phenomena (including write skew), and SSI was created to resolve them.&lt;/p&gt;
&lt;p&gt;Mainstream databases implement concurrency control via S2PL or MVCC. Under S2PL, write operations block reads and writes from other transactions, so there is no write skew. MVCC, however, allows reads and writes not to block each other — only write-write conflicts. In concurrent read-write patterns, this leads to write skew. Starting from PostgreSQL 9.1, SSI has been embedded into Snapshot Isolation (PostgreSQL only has Snapshot Isolation, even at the serializable level), resolving write skew and other anomalies.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Write Skew
 &lt;div id="write-skew" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#write-skew" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When certain conflicts form a cycle, serialization anomalies occur. One of the easier ones to understand is &lt;strong&gt;write skew&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Write skew only happens in read-write patterns (not write-write or write-read), and only under concurrent conditions. A dependency cycle forms when a preceding transaction&amp;rsquo;s write depends on a later transaction&amp;rsquo;s write.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2e661194aa05.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;There are many real-world cases of write skew. Let&amp;rsquo;s understand it through the classic &lt;strong&gt;black-and-white ball problem&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;A bag contains 10 balls: 5 white and 5 black. Two transactions, P and Q, are running. P changes all black balls to white; Q changes all white balls to black. There are two possible serial executions: P then Q, or Q then P. In both cases, the final result is either 10 white balls or 10 black balls. However, Snapshot Isolation allows another outcome:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transaction P picks up 5 black balls&lt;/li&gt;
&lt;li&gt;Transaction Q picks up 5 white balls&lt;/li&gt;
&lt;li&gt;Transaction P changes all the balls in hand to white and puts them back&lt;/li&gt;
&lt;li&gt;Transaction Q changes all the balls in hand to black and puts them back&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now the bag still has 5 black and 5 white balls — an outcome impossible in any serial execution. Yet this is valid under Snapshot Isolation: each transaction maintains a consistent view of the database, and its write set does not overlap with any concurrent transaction&amp;rsquo;s write set. Hence, the black and white balls are swapped.&lt;/p&gt;
&lt;p&gt;The black-and-white ball problem illustrates: the result under Snapshot Isolation is inconsistent with the result under serial execution. Write skew occurs under Snapshot Isolation, and the data outcome does not match expectations.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SSI in PostgreSQL
 &lt;div id="ssi-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ssi-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL was the first database to implement SSI. Here is the black-and-white ball example using the Wikipedia code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; dots
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id int &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color text &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; dots
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; x(id) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id, &lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; id &lt;span style="color:#f92672"&gt;%&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;black&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;white&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/th&gt;
 &lt;th style="text-align: left"&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;begin; &lt;br /&gt;update dots set color = &amp;lsquo;black&amp;rsquo; where color = &amp;lsquo;white&amp;rsquo;;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;begin; &lt;br /&gt; update dots set color = &amp;lsquo;white&amp;rsquo; where color = &amp;lsquo;black&amp;rsquo;;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;commit&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;commit&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;em&gt;(PostgreSQL SSI: first committer succeeds, second throws an error)&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(At Read Committed and Repeatable Read, no error is thrown; the black and white balls simply swap colors. Test results omitted.)&lt;/p&gt;
&lt;p&gt;Strict Two-Phase Locking (S2PL) can also achieve serializability, but S2PL requires heavy read-write locks held until transaction commit. S2PL severely impacts concurrency performance, and users generally won&amp;rsquo;t accept reads and writes blocking each other, so PostgreSQL does not use S2PL.&lt;/p&gt;
&lt;p&gt;SSI is an alternative approach to serializability. It still uses Snapshot Isolation but additionally checks for anomaly phenomena. The two approaches also handle anomalies differently: when one occurs, S2PL blocks transactions, while SSI aborts a transaction to break the cycle.&lt;/p&gt;
&lt;p&gt;One reason people avoid serializability is that it supposedly reduces database performance. This is understandable — SSI, which performs &amp;ldquo;anomaly checks,&amp;rdquo; must be slower than weaker isolation levels that do no such checking. However, with advances in SSI implementation theory and PostgreSQL&amp;rsquo;s optimizations for read-only transactions, SSI&amp;rsquo;s performance is now on par with SI.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7d32ee35fdba.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Serializability greatly simplifies applications&amp;rsquo; consistency concerns. PostgreSQL 9.1 has implemented SSI with optimizations. Let&amp;rsquo;s hope applications will one day truly adopt the serializable isolation level.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Isolation Level References
 &lt;div id="transaction-isolation-level-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-isolation-level-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/SSI" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/SSI&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Serializability" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Serializability&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Snapshot_isolation" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Snapshot_isolation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://justinjaffray.com/what-does-write-skew-look-like/" target="_blank" rel="noreferrer"&gt;https://justinjaffray.com/what-does-write-skew-look-like/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bailis.org/blog/when-is-acid-acid-rarely/" target="_blank" rel="noreferrer"&gt;http://www.bailis.org/blog/when-is-acid-acid-rarely/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf" target="_blank" rel="noreferrer"&gt;https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf&lt;/a&gt; — 1995 paper on SI isolation levels and critique of SQL-92&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/2009/Papers/p492-fekete.pdf" target="_blank" rel="noreferrer"&gt;https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/2009/Papers/p492-fekete.pdf&lt;/a&gt; — SSI paper&lt;/p&gt;
&lt;p&gt;&lt;a href="https://drkp.net/papers/ssi-vldb12.pdf" target="_blank" rel="noreferrer"&gt;https://drkp.net/papers/ssi-vldb12.pdf&lt;/a&gt; — PostgreSQL SSI implementation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ristret.com/s/f643zk/history_transaction_histories" target="_blank" rel="noreferrer"&gt;https://ristret.com/s/f643zk/history_transaction_histories&lt;/a&gt; — History of transaction isolation levels&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction Processing
 &lt;div id="transaction-processing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-processing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Transaction Blocks
 &lt;div id="transaction-blocks" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-blocks" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transactions can be implicit or explicit. An implicit transaction is a standalone SQL statement that auto-commits upon completion. An explicit transaction requires an explicit declaration; multiple SQL statements grouped together form a transaction block.&lt;/p&gt;
&lt;p&gt;Transaction blocks begin with &lt;code&gt;begin&lt;/code&gt;, &lt;code&gt;begin transaction&lt;/code&gt;, or &lt;code&gt;start transaction&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;They end with &lt;code&gt;COMMIT&lt;/code&gt;, &lt;code&gt;END&lt;/code&gt;, or &lt;code&gt;ABORT&lt;/code&gt;, &lt;code&gt;ROLLBACK&lt;/code&gt;, where &lt;code&gt;COMMIT=END&lt;/code&gt; and &lt;code&gt;ABORT=ROLLBACK&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If an error occurs during a transaction block, the transaction can only be rolled back due to atomicity:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: relation &lt;span style="color:#e6db74"&gt;&amp;#34;lzl2&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;^&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=!#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Transaction Processing Functions
 &lt;div id="transaction-processing-functions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-processing-functions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction processing functions are organized into three layers: top-level transaction functions, middle-level transaction functions, and bottom-level transaction functions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Top-level transaction functions&lt;/strong&gt; handle transaction block commands like &lt;code&gt;BEGIN&lt;/code&gt;, &lt;code&gt;COMMIT&lt;/code&gt;, &lt;code&gt;ROLLBACK&lt;/code&gt;, &lt;code&gt;SAVEPOINT&lt;/code&gt;, etc.:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;BeginTransactionBlock&lt;/th&gt;
 &lt;th&gt;Start a transaction block&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;EndTransactionBlock&lt;/td&gt;
 &lt;td&gt;End a transaction block&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UserAbortTransactionBlock&lt;/td&gt;
 &lt;td&gt;User-initiated transaction abort&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;DefineSavepoint&lt;/td&gt;
 &lt;td&gt;Create a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RollbackToSavepoint&lt;/td&gt;
 &lt;td&gt;Roll back to a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ReleaseSavepoint&lt;/td&gt;
 &lt;td&gt;Release a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Middle-level transaction functions&lt;/strong&gt;: every SQL statement calls middle-level functions before and after execution, including after detecting an exception:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;StartTransactionCommand&lt;/th&gt;
 &lt;th&gt;Start a transaction command&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitTransactionCommand&lt;/td&gt;
 &lt;td&gt;Complete a transaction command (not commit)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortCurrentTransaction&lt;/td&gt;
 &lt;td&gt;Abort the current transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Bottom-level transaction functions&lt;/strong&gt;: the actual transaction processing functions, responsible for maintaining transaction state, allocating and reclaiming transaction resources, etc.:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;StartTransaction&lt;/th&gt;
 &lt;th&gt;Start a transaction&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitTransaction&lt;/td&gt;
 &lt;td&gt;Commit a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortTransaction&lt;/td&gt;
 &lt;td&gt;Rollback/abort a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CleanupTransaction&lt;/td&gt;
 &lt;td&gt;Clean up a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;StartSubTransaction&lt;/td&gt;
 &lt;td&gt;Start a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitSubTransaction&lt;/td&gt;
 &lt;td&gt;Commit a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortSubTransaction&lt;/td&gt;
 &lt;td&gt;Rollback/abort a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CleanupSubTransaction&lt;/td&gt;
 &lt;td&gt;Clean up a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These functions are fairly easy to distinguish. Aside from a few special functions (top-level &lt;code&gt;savepoint&lt;/code&gt;-related, middle-level &lt;code&gt;abort&lt;/code&gt; function), the three layers are organized as: *Block (transaction block functions), *Command (command functions), and *Transaction (actual transaction processing functions). Savepoints/subtransactions are treated as transaction-block-level functions (subtransactions can be rolled back within a transaction block, so placing them at the block level makes sense), and abort is treated as a command-level function.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Block States
 &lt;div id="transaction-block-states" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-block-states" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Top-level and middle-level functions jointly control the transaction block state; bottom-level functions control the transaction state.&lt;/p&gt;
&lt;p&gt;Both transaction block states and transaction states are in &lt;code&gt;src/backend/access/transam/xact.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; TBlockState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* states not in a transaction block */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_DEFAULT, &lt;span style="color:#75715e"&gt;/* idle state; entering or exiting a transaction returns to this state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_STARTED, &lt;span style="color:#75715e"&gt;/* just entered a transaction block; transitions from TBLOCK_DEFAULT; short-lived */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* transaction block states */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_BEGIN, &lt;span style="color:#75715e"&gt;/* start a transaction block; at this point data block is started, entering block-level state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction; after BEGIN, the block stays in this state until transaction ends */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_IMPLICIT_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction with an implicit BEGIN */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_PARALLEL_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction in parallel execution */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_END, &lt;span style="color:#75715e"&gt;/* received COMMIT command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT, &lt;span style="color:#75715e"&gt;/* transaction failed, waiting for ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT_END, &lt;span style="color:#75715e"&gt;/* transaction failed, received ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT_PENDING, &lt;span style="color:#75715e"&gt;/* active transaction, received ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_PREPARE, &lt;span style="color:#75715e"&gt;/* active transaction, received PREPARE (explicit 2PC) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* subtransaction states (still transaction-block level) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBBEGIN, &lt;span style="color:#75715e"&gt;/* start a subtransaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBINPROGRESS, &lt;span style="color:#75715e"&gt;/* active subtransaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBRELEASE, &lt;span style="color:#75715e"&gt;/* received RELEASE (release savepoint) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBCOMMIT, &lt;span style="color:#75715e"&gt;/* parent transaction COMMIT while subtransaction is still running (SUBINPROGRESS) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT, &lt;span style="color:#75715e"&gt;/* failed subtransaction, waiting for rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_END, &lt;span style="color:#75715e"&gt;/* failed subtransaction, received rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_PENDING, &lt;span style="color:#75715e"&gt;/* active subtransaction, received rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBRESTART, &lt;span style="color:#75715e"&gt;/* active subtransaction, received rollback to command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_RESTART &lt;span style="color:#75715e"&gt;/* failed subtransaction, received ROLLBACK TO command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} TBlockState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most states are self-explanatory. A note on rollback vs. abort: their subsequent behavior is similar — both need to clean up transaction resources and exit the current transaction. Yet PostgreSQL separates them into two behaviors with two states: &lt;code&gt;TBLOCK_ABORT&lt;/code&gt; and &lt;code&gt;TBLOCK_ABORT_END&lt;/code&gt; (and similarly for subtransactions). Why?&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/backend/access/transam/README&lt;/code&gt; offers a detailed explanation:&lt;/p&gt;
&lt;blockquote&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Scenario 1&lt;/th&gt;
 &lt;th&gt;Scenario 2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1) User types &lt;code&gt;BEGIN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1) User types &lt;code&gt;BEGIN&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2) User executes some commands&lt;/td&gt;
 &lt;td&gt;2) User executes some commands&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3) User doesn&amp;rsquo;t like what she sees, types &lt;code&gt;ABORT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;3) The transaction system aborts for some reason (syntax error, etc.)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In Scenario 1, we want to abort the transaction and return to the default state.&lt;/p&gt;
&lt;p&gt;In Scenario 2, more commands may follow that are still part of the current transaction block. We must ignore these commands until we see &lt;code&gt;COMMIT&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;AbortCurrentTransaction&lt;/code&gt; handles internal transaction aborts; &lt;code&gt;UserAbortTransactionBlock&lt;/code&gt; handles user-initiated aborts. Both rely on &lt;code&gt;AbortTransaction&lt;/code&gt; to do all the real work. The only difference is what state we enter after &lt;code&gt;AbortTransaction&lt;/code&gt; finishes:&lt;/p&gt;
&lt;p&gt;* AbortCurrentTransaction leaves us in TBLOCK_ABORT&lt;/p&gt;
&lt;p&gt;* UserAbortTransactionBlock leaves us in TBLOCK_ABORT_END&lt;/p&gt;
&lt;p&gt;Bottom-level transaction abort processing has two phases:&lt;/p&gt;
&lt;p&gt;* As soon as we realize the transaction has failed, &lt;code&gt;AbortTransaction&lt;/code&gt; is executed. This should release all shared resources (locks, etc.) to avoid unnecessarily increasing latency for other backends.&lt;/p&gt;
&lt;p&gt;* When we finally see the user&amp;rsquo;s &lt;code&gt;COMMIT&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt;, &lt;code&gt;CleanupTransaction&lt;/code&gt; is executed; this function cleans up resources and gets us completely out of the transaction. In particular, we cannot destroy &lt;code&gt;TopTransactionContext&lt;/code&gt; before this point.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;Transaction States
 &lt;div id="transaction-states" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-states" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction states are straightforward (note: these are different from transaction block states):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; TransState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_DEFAULT, &lt;span style="color:#75715e"&gt;/* idle */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_START, &lt;span style="color:#75715e"&gt;/* transaction started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_COMMIT, &lt;span style="color:#75715e"&gt;/* transaction commit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_ABORT, &lt;span style="color:#75715e"&gt;/* abort transaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_PREPARE &lt;span style="color:#75715e"&gt;/* prepare transaction (2PC) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} TransState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Transaction State Flow
 &lt;div id="transaction-state-flow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-state-flow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each command in a transaction block calls transaction functions, which in turn transition the transaction and transaction block states.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s use the simplest transaction block as an example (from the README):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; foo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; foo &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (...)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Command call relationships:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; / StartTransactionCommand; -- middle-level: start transaction command
 / StartTransaction; -- bottom-level: actually start the transaction
 1)&amp;lt; ProcessUtility; -- ProcessUtility handles the BEGIN command
 \ BeginTransactionBlock; -- top-level: start transaction block
 \ CommitTransactionCommand; -- middle-level: complete command

 / StartTransactionCommand; -- middle-level: start transaction command
2) / PortalRunSelect; -- execute SELECT statement
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommandCounterIncrement; -- middle-level: command counter increment

 / StartTransactionCommand; -- middle-level: start transaction command
3) / ProcessQuery; -- execute INSERT statement
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommandCounterIncrement; -- command counter +1

 / StartTransactionCommand; -- middle-level: start transaction command
 / ProcessUtility; -- ProcessUtility handles COMMIT command
4) &amp;lt; EndTransactionBlock; -- top-level: end transaction block
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommitTransaction; -- bottom-level: actually commit the transaction
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Every command in a transaction block begins with the middle-level &lt;code&gt;StartTransactionCommand&lt;/code&gt; and ends with &lt;code&gt;CommitTransactionCommand&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Between these two middle-level functions is where the actual command processing occurs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transaction block state for 2) SELECT and 3) INSERT is &lt;code&gt;TBLOCK_INPROGRESS&lt;/code&gt;. The state transitions for &lt;code&gt;BEGIN&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b8f307da3f3f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Function References
 &lt;div id="transaction-function-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-function-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;PostgreSQL Internals&lt;/em&gt; (book)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/backend/access/transam/README&lt;/code&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction ID
 &lt;div id="transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Every transaction in PostgreSQL is assigned a transaction ID. Transaction IDs come in two forms: virtual transaction IDs and persistent transaction IDs. Understanding transaction IDs is crucial for grasping transactions, data visibility, transaction ID wraparound, and more.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Virtual Transaction ID
 &lt;div id="virtual-transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#virtual-transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Read-only transactions are not assigned a transaction ID — transaction IDs are a precious resource. A simple SELECT, for instance, won&amp;rsquo;t consume one. However, to identify transactions for purposes such as shared locks, a non-persistent transaction ID is needed. This is the virtual transaction ID (VXID).&lt;/p&gt;
&lt;p&gt;VXID consists of two parts: a backend ID and a backend-local counter.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/lock.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;BackendId backendId; &lt;span style="color:#75715e"&gt;/* backendId from PGPROC */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LocalTransactionId localTransactionId; &lt;span style="color:#75715e"&gt;/* lxid from PGPROC */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} VirtualTransactionId;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(PGPROC is a structure storing process information; we&amp;rsquo;ll cover it later.)&lt;/p&gt;
&lt;p&gt;You can see VXID in &lt;code&gt;pg_locks&lt;/code&gt;. Querying &lt;code&gt;pg_locks&lt;/code&gt; itself is a SQL statement, so it generates a VXID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation		&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 		 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; 	 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 		 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 	 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After &lt;code&gt;\q&lt;/code&gt; (disconnect) and immediately logging back in, the counter continues: &lt;code&gt;4/19&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Opening another window gives &lt;code&gt;backendID+1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From these tests we can observe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The VXID&amp;rsquo;s backend ID is not the actual process PID; it&amp;rsquo;s simply an incrementing number.&lt;/li&gt;
&lt;li&gt;Both the VXID&amp;rsquo;s backend ID and command counter are incrementing.&lt;/li&gt;
&lt;li&gt;Subtransactions do not have their own VXID; they use the parent transaction&amp;rsquo;s VXID.&lt;/li&gt;
&lt;li&gt;VXID also has wraparound, but it&amp;rsquo;s not a serious issue since it isn&amp;rsquo;t persisted — after an instance restart, VXID starts counting from scratch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Persistent Transaction ID
 &lt;div id="persistent-transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#persistent-transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;32-bit TransactionId
 &lt;div id="32-bit-transactionid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#32-bit-transactionid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When a data-modifying transaction begins, the transaction manager assigns it a unique identifier: &lt;code&gt;TransactionId&lt;/code&gt;. &lt;code&gt;TransactionId&lt;/code&gt; is a 32-bit unsigned integer, capable of storing &lt;code&gt;2^32 = 4,294,967,296&lt;/code&gt; — about 4.2 billion — transactions. The range of a 32-bit unsigned integer is &lt;code&gt;0 ~ 2^32 - 1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Three special transaction IDs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/include/access/transam.h&lt;/code&gt; defines several special transaction IDs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define InvalidTransactionId ((TransactionId) 0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define BootstrapTransactionId ((TransactionId) 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FrozenTransactionId ((TransactionId) 2)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FirstNormalTransactionId ((TransactionId) 3)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MaxTransactionId ((TransactionId) 0xFFFFFFFF)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;0: Invalid TransactionId&lt;/li&gt;
&lt;li&gt;1: Bootstrap Transaction ID, used only during database initialization. Older than all normal transactions.&lt;/li&gt;
&lt;li&gt;2: Frozen Transaction ID. Older than all normal transactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdIsNormal(xid) ((xid) &amp;gt;= FirstNormalTransactionId)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A transaction ID &amp;gt;= 3 is a normal transaction ID.&lt;/p&gt;
&lt;p&gt;The maximum transaction ID, &lt;code&gt;MaxTransactionId&lt;/code&gt;, is &lt;code&gt;0xFFFFFFFF = 4,294,967,295 = 2^32 - 1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So the allocatable range for normal transaction IDs is: &lt;code&gt;3 ~ 2^32 - 1&lt;/code&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;64-bit FullTransactionId
 &lt;div id="64-bit-fulltransactionid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#64-bit-fulltransactionid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Transaction IDs increment sequentially. PostgreSQL has used 32-bit transaction IDs for a long time. Before PostgreSQL 7.2, when the 32-bit transaction ID was exhausted, you had to dump and restore the database. A 64-bit transaction ID, on the other hand, is practically inexhaustible. The source defines a 64-bit &lt;code&gt;FullTransactionId&lt;/code&gt; as a struct:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *A 64-bit value containing an epoch and a TransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *It is wrapped in a struct to prevent implicit conversion to TransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *Not all values represent valid normal XIDs.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; FullTransactionId
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uint64 value;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} FullTransactionId;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The 64-bit value consists of an &lt;code&gt;epoch&lt;/code&gt; and a 32-bit &lt;code&gt;TransactionId&lt;/code&gt;, converted via these functions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define EpochFromFullTransactionId(x)	((uint32) ((x).value &amp;gt;&amp;gt; 32))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XidFromFullTransactionId(x)		((uint32) (x).value)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The epoch is &lt;code&gt;FullTransactionId&lt;/code&gt; shifted right 32 bits; the XID (&lt;code&gt;TransactionId&lt;/code&gt;) is &lt;code&gt;FullTransactionId&lt;/code&gt; modulo &lt;code&gt;2^32&lt;/code&gt;. This is like treating the 32-bit &lt;code&gt;TransactionId&lt;/code&gt; as a &amp;ldquo;circle&amp;rdquo; that loops, while the 64-bit &lt;code&gt;FullTransactionId&lt;/code&gt; is a &amp;ldquo;line&amp;rdquo; that keeps growing, nearly inexhaustible.&lt;/p&gt;
&lt;p&gt;A full transaction ID can exceed &lt;code&gt;2^32&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e91011271323.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Assignment
 &lt;div id="transaction-id-assignment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-assignment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s run a few experiments to see how transaction IDs are assigned. We&amp;rsquo;ll use two functions that return transaction IDs:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_current_xact_id()&lt;/code&gt;: returns the current transaction ID; if the current transaction has not yet been assigned one, it allocates one. (In pg12 and earlier, use &lt;code&gt;txid_current()&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_current_xact_id_if_assigned()&lt;/code&gt;: returns the current transaction ID; if the current transaction has not yet been assigned one, returns NULL. (In pg12 and earlier, use &lt;code&gt;txid_current_if_assigned()&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction IDs are assigned sequentially:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 612
lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 613
lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 614&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;BEGIN does not immediately allocate a transaction ID:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# begin; -- explicitly start a transaction
BEGIN
lzldb=*# select pg_current_xact_id_if_assigned () ; -- BEGIN does not immediately allocate a transaction ID
 pg_current_xact_id_if_assigned 
-------------------------------- 
(1 row)
lzldb=*# select * from lzl1; -- query immediately after BEGIN
 a 
---
(0 rows)
lzldb=*# select pg_current_xact_id_if_assigned () ; -- queries do not allocate transaction IDs
 pg_current_xact_id_if_assigned 
-------------------------------- 
(1 row)
lzldb=*# insert into lzl1 values(1); -- insert data, a data change
INSERT 0 1
lzldb=*# select pg_current_xact_id_if_assigned () ; -- the first non-query statement after BEGIN allocates a transaction ID
 pg_current_xact_id_if_assigned 
--------------------------------
 611
lzldb=*# commit;
COMMIT
lzldb=# select xmin, pg_current_xact_id_if_assigned () from lzl1; -- the INSERT transaction writes to xmin
 xmin | pg_current_xact_id_if_assigned 
------+--------------------------------
 611 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some records in system catalogs were assigned &lt;code&gt;BootstrapTransactionId=1&lt;/code&gt; during database initialization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;postgres=# select xmin,count(*) from pg_class where xmin=1 group by xmin;
 xmin | count 
------+-------
 1 | 184&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conclusions from the experiments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;During database initialization, the special transaction ID 1 is assigned, visible in system catalogs.&lt;/li&gt;
&lt;li&gt;Transaction IDs are assigned incrementally.&lt;/li&gt;
&lt;li&gt;BEGIN does not immediately allocate a transaction ID; the first non-query statement after BEGIN allocates one.&lt;/li&gt;
&lt;li&gt;When a transaction inserts a tuple, the transaction&amp;rsquo;s txid is written into the tuple&amp;rsquo;s xmin.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Transaction ID Comparison
 &lt;div id="transaction-id-comparison" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-comparison" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL compares the age of transactions by their transaction IDs. &lt;code&gt;src/backend/access/transam/transam.c&lt;/code&gt; defines four comparison functions: &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;gt;=&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdPrecedesOrEquals&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdFollows&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdFollowsOrEquals&lt;/span&gt;()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;They are similar. Let&amp;rsquo;s examine &lt;code&gt;TransactionIdPrecedes()&lt;/code&gt; as the representative:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;(TransactionId id1, TransactionId id2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If either ID is a permanent XID then we can just do unsigned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * comparison. If both are normal, do a modulo-2^32 comparison.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;int32 diff;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id1) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (id1 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (int32) (id1 &lt;span style="color:#f92672"&gt;-&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (diff &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Key points from this source code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;TransactionIdIsNormal()&lt;/code&gt; is a macro defined in the header to check for normal transactions. &lt;code&gt;FirstNormalTransactionId&lt;/code&gt; is the constant 3. So a normal transaction ID is &amp;gt;= 3.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdIsNormal(xid) ((xid) &amp;gt;= FirstNormalTransactionId)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;int32&lt;/code&gt; is a signed integer: the first bit being 0 means positive, 1 means negative. Range: &lt;code&gt;-2^31 ~ 2^31 - 1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Integer overflow: when a value exceeds the storage range (e.g., &lt;code&gt;2^31&lt;/code&gt; barely overflows for int32), the value wraps around.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transaction ID comparison code can be understood in two parts:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Non-normal transaction ID comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id1) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (id1 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; id2);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;id1=2&lt;/code&gt;, &lt;code&gt;id2=100&lt;/code&gt;: &lt;code&gt;return(2&amp;lt;100)&lt;/code&gt;, precedes is true — the normal transaction is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1=100&lt;/code&gt;, &lt;code&gt;id2=2&lt;/code&gt;: &lt;code&gt;return(100&amp;lt;2)&lt;/code&gt;, precedes is false — the normal transaction is newer.&lt;/p&gt;
&lt;p&gt;So, txid 1 and 2 are older than normal transactions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Normal transaction ID comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (int32) (id1 &lt;span style="color:#f92672"&gt;-&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (diff &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;id1 - id2&lt;/code&gt; can be negative, so &lt;code&gt;diff&lt;/code&gt; cannot be unsigned int. It must be cast to signed int. Now the crucial part:&lt;/p&gt;
&lt;p&gt;Since int32 ranges from &lt;code&gt;-2^31&lt;/code&gt; to &lt;code&gt;2^31 - 1&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 2^31 + 99&lt;/code&gt;, &lt;code&gt;id2 = 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = 2^31 - 1&lt;/code&gt;. Fine — int32 can hold this. → Larger txid is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 2^31 + 100&lt;/code&gt;, &lt;code&gt;id2 = 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = 2^31&lt;/code&gt;. Problem — exactly exceeds int32 storage. The value becomes &lt;code&gt;2^31 - 2^32 = -2^31 &amp;lt; 0&lt;/code&gt;. → Smaller txid is considered newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 100&lt;/code&gt;, &lt;code&gt;id2 = 2^31 + 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = -2^31&lt;/code&gt;. Fine — int32 can hold this. → Larger txid is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 100&lt;/code&gt;, &lt;code&gt;id2 = 2^31 + 101&lt;/code&gt;: &lt;code&gt;id1 - id2 = -2^31 - 1&lt;/code&gt;. Problem — exactly exceeds int32 storage. The value becomes &lt;code&gt;-2^31 - 1 + 2^32 = 2^31 - 1 &amp;gt; 0&lt;/code&gt;. → Smaller txid is considered newer.&lt;/p&gt;
&lt;p&gt;From this analysis, when integer overflow occurs, a transaction with a larger txid cannot see a transaction with a smaller txid. The overflow itself is an exceptional event, so this is acceptable. To address this, PostgreSQL divides the 4-billion transaction ID space into two halves: one half is visible, the other invisible.&lt;/p&gt;
&lt;p&gt;For example, for transaction txid 100, the 2 billion transactions in its past are visible, and the 2 billion transactions in its future are invisible. Therefore, the maximum difference between the oldest and newest transaction IDs (the database age) in PostgreSQL is &lt;code&gt;|-2^31| = 2^31&lt;/code&gt;, roughly 2 billion.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b39c0f44d535.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Wraparound
 &lt;div id="transaction-id-wraparound" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-wraparound" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What is transaction ID wraparound?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Understanding transaction ID wraparound itself is not difficult, but when I first studied it, I found two different definitions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL official definition:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because transaction IDs are limited in size (32 bits), a cluster that runs for a long time (more than 4 billion transactions) will suffer transaction ID wraparound: the XID counter wraps around to zero, and suddenly past transactions appear to be in the future — meaning they become invisible. In short, catastrophic data loss. (The data is still there, but you can&amp;rsquo;t access it.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;interdb explanation:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A tuple&amp;rsquo;s t_xmin records the minimum transaction of that tuple. If the tuple never changes, this t_xmin stays the same. Suppose tuple_1 was created by transaction txid=100, so its t_xmin=100. If the database advances by &lt;code&gt;2^31&lt;/code&gt; transactions, reaching &lt;code&gt;2^31+100&lt;/code&gt;, tuple_1 is still visible. Then another transaction starts, advancing txid to &lt;code&gt;2^31+101&lt;/code&gt;. Now txid=100 is in the &amp;ldquo;future,&amp;rdquo; so tuple_1 becomes invisible. This is severe data loss — this is transaction ID wraparound.&lt;/p&gt;
&lt;p&gt;Yes, the official documentation and some classic articles define transaction ID wraparound differently. They are indeed describing two different things. I attribute this to a &lt;strong&gt;translation issue&lt;/strong&gt;: both behaviors are &lt;strong&gt;wraparound&lt;/strong&gt; in English semantics. If you reconsider the meaning of &amp;ldquo;wraparound,&amp;rdquo; they are both forms of it.&lt;/p&gt;
&lt;p&gt;However, they differ: one is when transaction IDs (&lt;code&gt;2^32&lt;/code&gt;) are fully exhausted and wrap back to 0; the other is when the &amp;ldquo;oldest transaction ID&amp;rdquo; and &amp;ldquo;newest transaction ID&amp;rdquo; differ by more than &lt;code&gt;2^31&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The official definition of transaction ID wraparound introduces the concept that &amp;ldquo;transaction IDs form a circle.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The generally understood transaction ID wraparound problem is the &amp;ldquo;circle divided into two halves, one visible, one invisible&amp;rdquo; concept — when the &amp;ldquo;more than half&amp;rdquo; threshold is crossed, that&amp;rsquo;s wraparound.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, the wraparound problem you actually need to worry about is the latter: the difference between the newest and oldest transaction IDs must not exceed 2.1 billion (&lt;code&gt;2^31&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How long does 2.1 billion transactions take?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;2.1 billion transactions sounds like a lot, but it can still be exhausted.&lt;/p&gt;
&lt;p&gt;For example, a PostgreSQL database with 100 TPS (not counting SELECT statements, since simple SELECTs don&amp;rsquo;t allocate transaction IDs) uses 8,640,000 transactions per day. It takes only about 2,147,483,648 / 8,640,000 ≈ 248 days to exhaust 2.1 billion transaction IDs and trigger wraparound. At 1,000 transactions per second, it takes less than one month. So transaction ID wraparound is something you must pay attention to in PostgreSQL.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Freezing
 &lt;div id="transaction-id-freezing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-freezing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To solve the serious data loss problem caused by transaction ID wraparound, PostgreSQL introduced the concept of transaction freezing.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/203cfe4768b1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;XIDs are reused cyclically and divided into two halves: one visible, one invisible. For a tuple with xid=100, if no operations are performed and transaction IDs keep advancing, the once-visible tuple will eventually become invisible.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7512304ffdf5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;As mentioned earlier, there is a frozen transaction ID. If the tuple with xid=100 is marked with the frozen transaction ID, it will remain visible. This is the purpose of transaction freezing.&lt;/p&gt;
&lt;p&gt;The frozen transaction ID &lt;code&gt;FrozenTransactionId = 2&lt;/code&gt;, and it is older than all normal transactions. That means txid=2 is visible to all normal transactions (txid &amp;gt;= 3). When t_xmin is older than &lt;code&gt;current_txid - vacuum_freeze_min_age&lt;/code&gt; (default 50 million), the tuple is rewritten with the frozen transaction ID 2. In version 9.4 and later, the &lt;code&gt;xmin_frozen&lt;/code&gt; flag in t_infomask is used to indicate a frozen tuple, rather than rewriting t_xmin to 2.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/352182ad7218.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;There are many optimization approaches to the transaction ID wraparound problem, but none can avoid transaction freezing. Freezing involves reading every row of every table and resetting flags — a massive I/O and CPU operation. There&amp;rsquo;s no escaping it; the database may even reject all operations until freezing completes. This is known as the &amp;ldquo;freeze bomb.&amp;rdquo; The busier the system and the higher the transaction rate, the more likely it is to trigger. (We&amp;rsquo;ll expand on freeze optimization in a future chapter.)&lt;/p&gt;

&lt;h3 class="relative group"&gt;64-bit Transaction IDs
 &lt;div id="64-bit-transaction-ids" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#64-bit-transaction-ids" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;ultimate solution&lt;/strong&gt; to transaction ID exhaustion and wraparound is using 64-bit transaction IDs. A 32-bit txid provides &lt;code&gt;2^32&lt;/code&gt; IDs; a 64-bit txid provides &lt;code&gt;2^64&lt;/code&gt;. Even at 10,000 transactions per second — 864 million per day — it would take 58.49 million years to exhaust them. With 64-bit transaction IDs, they are practically inexhaustible. No wraparound, no freezing, no &amp;ldquo;freeze bomb&amp;rdquo;&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why hasn&amp;rsquo;t 64-bit transaction ID been implemented yet?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Note: 64-bit transaction IDs already exist in PostgreSQL (as &lt;code&gt;FullTransactionId&lt;/code&gt; described earlier). However, because tuple storage is limited, the xmin, xmax, etc. in tuples still use 32-bit XIDs, and transaction ID comparison still relies on 32-bit XIDs. xmin and xmax — the transaction IDs for insert and delete — are stored in each tuple&amp;rsquo;s header (we&amp;rsquo;ll cover tuple structure later), and header space is limited. A 32-bit txid is 4 bytes; a 64-bit txid is 8 bytes. Storing both xmin and xmax as 64-bit would require an extra 8 bytes, which the current header cannot accommodate. The community has discussed two approaches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Extend the header to store 64-bit transaction IDs directly.&lt;/li&gt;
&lt;li&gt;Keep the header size unchanged. Retain 64-bit transaction IDs in memory, adding an epoch concept to convert between the two.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first approach has been essentially abandoned — compared to other systems, PostgreSQL&amp;rsquo;s tuple header is already large enough.&lt;/p&gt;
&lt;p&gt;The second approach already has epochs and FullTransactionId-to-TransactionId conversion. The key is how to convert the TransactionId in tuples to FullTransactionId (though some extra storage for the epoch would still be needed — otherwise, how to implement it?).&lt;/p&gt;
&lt;p&gt;See community mailing list discussions:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/CAEYLb_UfC&amp;#43;HZ4RAP7XuoFZr&amp;#43;2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD@gmail.com" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The community proposed 64-bit transaction IDs as a permanent solution to the freeze problem back in 2014, and began discussing practical implementation in 2017. But after several PostgreSQL versions, it&amp;rsquo;s still vaporware. Given the sensitivity and importance of data in databases, and how many things transaction ID changes touch — one slip could mean data loss or unknown bugs — PostgreSQL is moving cautiously. However, the community is still considering it. Hopefully one day, in some PostgreSQL version, the transaction ID wraparound problem will be completely solved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID References
 &lt;div id="transaction-id-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.slideshare.net/masahikosawada98/introduction-vauum-freezing-xid-wraparound?from_action=save" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/masahikosawada98/introduction-vauum-freezing-xid-wraparound?from_action=save&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/427012" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/427012&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/377530" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/377530&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/weixin_30916255/article/details/112365965" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_30916255/article/details/112365965&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/FullTransactionId" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/FullTransactionId&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.bookstack.cn/read/aliyun-rds-core/bd7e1c1955b35f7d.md" target="_blank" rel="noreferrer"&gt;https://www.bookstack.cn/read/aliyun-rds-core/bd7e1c1955b35f7d.md&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/digoal/blog/blob/master/201605/20160520_01.md" target="_blank" rel="noreferrer"&gt;https://github.com/digoal/blog/blob/master/201605/20160520_01.md&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction-Related Tuple Structure
 &lt;div id="transaction-related-tuple-structure" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-related-tuple-structure" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The tuple structure contains much of the information essential to PostgreSQL&amp;rsquo;s MVCC. The following sections cover xmin, xmax, t_ctid, cmin, cmax, combo CID, and tuple ID — their meanings and relationships.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Structure
 &lt;div id="physical-structure" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-structure" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2d7dd2db28e1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;HeapTupleHeaderData&lt;/code&gt; is the tuple header. Its structure is defined in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleFields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId t_xmin;		&lt;span style="color:#75715e"&gt;/* transaction ID of inserter */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId t_xmax;		&lt;span style="color:#75715e"&gt;/* transaction ID of deleter or locker */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		CommandId	t_cid;		&lt;span style="color:#75715e"&gt;/* command ID of insert or delete */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId t_xvac;	&lt;span style="color:#75715e"&gt;/* VACUUM FULL transaction ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}			t_field3;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} HeapTupleFields;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; DatumTupleFields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} DatumTupleFields;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleHeaderData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		HeapTupleFields t_heap;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		DatumTupleFields t_datum;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}			t_choice;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ItemPointerData t_ctid;		&lt;span style="color:#75715e"&gt;/* TID of current tuple or updated tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Five definitions in &lt;code&gt;HeapTupleHeaderData&lt;/code&gt; are critically important to MVCC. (Here, &amp;ldquo;x&amp;rdquo; = transaction, &amp;ldquo;c&amp;rdquo; = command, &amp;ldquo;t&amp;rdquo; = tuple — helpful for categorization.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;t_xmin&lt;/code&gt;: the transaction ID that inserted this tuple.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_xmax&lt;/code&gt;: the transaction ID that deleted this tuple, or the transaction ID that rolled back. If the tuple has not been deleted or updated, xmax is 0. If the delete or update was rolled back, xmax is the rolling-back transaction&amp;rsquo;s ID.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_xvac&lt;/code&gt;: the transaction ID set when the tuple is vacuumed. At that point, the tuple is detached from its original transaction.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_cid&lt;/code&gt;: the command ID (cid). A transaction can contain multiple SQL statements. Commands within a transaction are numbered starting from 0, incrementing sequentially. CommandId is a uint32 type, supporting up to &lt;code&gt;2^32 - 1&lt;/code&gt; commands. To conserve resources, and because queries don&amp;rsquo;t affect row transaction ordering, queries do not increment cid (similar to how transaction IDs are allocated).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_ctid&lt;/code&gt;: stores a pointer to itself or to a newer tuple. TID identifies a tuple within a table — it is the tuple&amp;rsquo;s physical address. If a record is modified multiple times, multiple versions exist. These versions are linked via t_ctid, forming a version chain that can be followed to find the latest version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;System Columns
 &lt;div id="system-columns" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#system-columns" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Every tuple has 6 system columns (directly queryable): &lt;code&gt;tableoid&lt;/code&gt;, &lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, &lt;code&gt;cmin&lt;/code&gt;, &lt;code&gt;cmax&lt;/code&gt;, &lt;code&gt;ctid&lt;/code&gt;. &lt;code&gt;tableoid&lt;/code&gt; is the table&amp;rsquo;s OID and doesn&amp;rsquo;t change during queries or DML. Here we focus on the remaining 5:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;616&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;619&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cmin&lt;/code&gt;: the command ID that inserted the tuple.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cmax&lt;/code&gt;: the command ID that deleted the tuple.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, and &lt;code&gt;xvac&lt;/code&gt; are physically stored in &lt;code&gt;struct HeapTupleFields&lt;/code&gt;. But &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; are not separate fields — they are derived from &lt;code&gt;t_cid&lt;/code&gt; in the struct.&lt;/p&gt;
&lt;p&gt;The source for &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; is in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* SetCmin is reasonably simple since we never need a combo CID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderSetCmin(tup, cid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;do { \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	Assert(!((tup)-&amp;gt;t_infomask &amp;amp; HEAP_MOVED)); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid = (cid); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_infomask &amp;amp;= ~HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;} while (0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* SetCmax must be used after HeapTupleHeaderAdjustCmax; see combocid.c */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderSetCmax(tup, cid, iscombo) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;do { \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	Assert(!((tup)-&amp;gt;t_infomask &amp;amp; HEAP_MOVED)); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid = (cid); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	if (iscombo) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		(tup)-&amp;gt;t_infomask |= HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	else \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		(tup)-&amp;gt;t_infomask &amp;amp;= ~HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;} while (0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * HeapTupleHeaderGetRawCommandId will give you what&amp;#39;s in the header whether
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * it is useful or not. Most code should use HeapTupleHeaderGetCmin or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * HeapTupleHeaderGetCmax instead, but note that those Assert that you can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * get a legitimate result, ie you are in the originating transaction!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderGetRawCommandId(tup) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;( \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Combo CID
 &lt;div id="combo-cid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#combo-cid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Before 8.3, &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; were separate. Later, considering that it&amp;rsquo;s rare for a single transaction to both insert and delete the same row, and that &lt;code&gt;cmin&lt;/code&gt;/&lt;code&gt;cmax&lt;/code&gt; are not needed after the transaction ends, the two were merged into a &amp;ldquo;combo command ID,&amp;rdquo; or &lt;code&gt;combocid&lt;/code&gt;, to save header space.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;combocid&lt;/code&gt; source: &lt;code&gt;src/backend/utils/time/combocid.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Key and entry structures for the hash table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;typedef struct
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CommandId	cmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CommandId	cmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; ComboCidKeyData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* comboid structure is cmin and cmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; CommandId
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GetComboCommandId(CommandId cmin, CommandId cmax)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The hash table is only created the first time a combo cid is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (comboHash &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASHCTL		hash_ctl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* generate array and hash table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	comboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (ComboCidKeyData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		MemoryContextAlloc(TopTransactionContext,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 sizeof(ComboCidKeyData) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CCID_ARRAY_SIZE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sizeComboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CCID_ARRAY_SIZE;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	usedComboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	memset(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;hash_ctl, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, sizeof(hash_ctl));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	comboHash &lt;span style="color:#f92672"&gt;=&lt;/span&gt; hash_create(&lt;span style="color:#e6db74"&gt;&amp;#34;Combo CIDs&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							CCID_HASH_SIZE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;hash_ctl,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							HASH_ELEM &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_BLOBS &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_CONTEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;combocid&lt;/code&gt; is stored in a hash table. The first time a transaction uses &lt;code&gt;combocid&lt;/code&gt;, a small block of memory is allocated to store it.&lt;/p&gt;
&lt;p&gt;So the relationship among these command IDs is: &lt;strong&gt;combocid → (cmin, cmax) → (t_cid, t_cid)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Simple Relationships Among Transaction IDs and System Columns
 &lt;div id="simple-relationships-among-transaction-ids-and-system-columns" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#simple-relationships-among-transaction-ids-and-system-columns" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With all these IDs and source code, things might seem confusing. Here&amp;rsquo;s a diagram to help understand and remember the relationships among transaction IDs, command IDs, and tuple IDs:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/077888610817.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;A First Taste of Transactions
 &lt;div id="a-first-taste-of-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-first-taste-of-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Without any tools or extensions, let&amp;rsquo;s get a feel for how these system columns change during a transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;622&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- after update, xmin+1, ctid+1; a new tuple appears
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;623&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- xmax records the rollback transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- xmin and ctid return to old values; the old tuple barely changes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;622&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;623&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- update again; tuple number jumps over 2 directly to 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;624&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Tuple Header and Transactions
 &lt;div id="tuple-header-and-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tuple-header-and-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The pageinspect Extension
 &lt;div id="the-pageinspect-extension" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pageinspect-extension" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Simply looking at row changes won&amp;rsquo;t show old tuples. You need the pageinspect extension. pageinspect is a contrib module bundled with PostgreSQL that can display the detailed contents of data pages. To observe how tuples support transactions, we&amp;rsquo;ll use &lt;code&gt;get_raw_page()&lt;/code&gt; and &lt;code&gt;heap_page_items()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;get_raw_page()&lt;/code&gt;: returns the binary content of a specified block. The &lt;code&gt;fork&lt;/code&gt; parameter accepts &lt;code&gt;main&lt;/code&gt;, &lt;code&gt;fsm&lt;/code&gt;, &lt;code&gt;vm&lt;/code&gt;, or &lt;code&gt;init&lt;/code&gt;. &lt;code&gt;main&lt;/code&gt; is the main data file; &lt;code&gt;fsm&lt;/code&gt; is the free space map; &lt;code&gt;vm&lt;/code&gt; is the visibility map; &lt;code&gt;init&lt;/code&gt; is the initialization fork. Defaults to &lt;code&gt;main&lt;/code&gt; if not specified.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heap_page_items()&lt;/code&gt;: displays all line pointers on a heap page, including rows invisible under MVCC.&lt;/p&gt;
&lt;p&gt;Generally, &lt;code&gt;get_raw_page()&lt;/code&gt; is passed as a parameter to &lt;code&gt;heap_page_items()&lt;/code&gt; to display tuple headers, pointers, and the data itself.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heap_tuple_infomask_flags&lt;/code&gt;: converts decimal infomask/infomask2 values into their meanings (flags), outputting two columns: all individual flags and combined flags. (Infomask is covered later.)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension pageinspect;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------+-------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;633&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;lp (Line Pointer)
 &lt;div id="lp-line-pointer" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lp-line-pointer" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A line pointer is essentially a row pointer &lt;strong&gt;number&lt;/strong&gt; within a page, marking a tuple&amp;rsquo;s location. t_ctid looks more like a tuple ID, but ctid is simply the combination of (table page number, line pointer number). ctid can point to the next lp.&lt;/p&gt;
&lt;p&gt;For example, after one UPDATE, a new tuple is added. The new tuple&amp;rsquo;s lp number increments by 1, the old tuple&amp;rsquo;s ctid points to the new tuple&amp;rsquo;s lp, and the new tuple&amp;rsquo;s ctid points to itself:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lp source: &lt;code&gt;src/include/storage/itemid.h&lt;/code&gt;. The &lt;code&gt;ItemIdData&lt;/code&gt; struct stores the tuple&amp;rsquo;s offset, state, and length:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; ItemIdData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt;	lp_off:&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;,		&lt;span style="color:#75715e"&gt;/* tuple offset within the page */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				lp_flags:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,		&lt;span style="color:#75715e"&gt;/* lp state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				lp_len:&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;;		&lt;span style="color:#75715e"&gt;/* tuple length */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} ItemIdData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; ItemIdData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ItemId;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* lp_off:15 is a bit-field; lp_off occupies 15 bits of the unsigned. The 3 fields together total 32 bits. So ItemIdData is an int, 4 bytes, 32 bits. */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;lp_flags&lt;/code&gt; defines 4 states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *lp_flags has these possible states. An UNUSED line pointer is available
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *for immediate re-use, the other states are not.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_UNUSED		0		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* lp not in use, tuple length lp_len always 0 */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_NORMAL		1		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* lp in use, tuple length lp_len always &amp;gt; 0 */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_REDIRECT		2		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* HOT redirect to another lp (should have lp_len=0) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_DEAD			3		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* dead lp, vacuumable */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,lp_flags,lp_off,lp_len &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_off &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_len 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+----------+--------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8160&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Infomask
 &lt;div id="infomask" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Infomask provides information about transactions, locks, tuple state, etc. — such as committed, aborted, lock, HOT info, and more. There are two infomask fields in the header: &lt;code&gt;infomask&lt;/code&gt; and &lt;code&gt;infomask2&lt;/code&gt;. They store different information.&lt;/p&gt;

&lt;h4 class="relative group"&gt;infomask and infomask2
 &lt;div id="infomask-and-infomask2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-and-infomask2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;infomask&lt;/code&gt; source is in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint16		t_infomask2;	&lt;span style="color:#75715e"&gt;/* number of attributes + various flags */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint16		t_infomask;		&lt;span style="color:#75715e"&gt;/* various flag bits, see below */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask Flag Meanings
 &lt;div id="infomask-flag-meanings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-flag-meanings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * information stored in t_infomask:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASNULL			0x0001	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has null values */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASVARWIDTH		0x0002	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has variable-width attributes, e.g. varchar */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASEXTERNAL		0x0004	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has TOAST storage */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASOID_OLD			0x0008	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has OID */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_KEYSHR_LOCK	0x0010	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has FOR KEY SHARE lock */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_COMBOCID			0x0020	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_cid is a combo CID */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_EXCL_LOCK		0x0040	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has FOR UPDATE lock */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_LOCK_ONLY		0x0080	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is only a locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;/* xmax is a shared locker */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_SHR_LOCK	(HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_LOCK_MASK	(HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_COMMITTED		0x0100	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_INVALID		0x0200	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_FROZEN		(HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_COMMITTED		0x0400	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_INVALID		0x0800	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_IS_MULTI		0x1000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_xmax is a MultiXactId */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_UPDATED			0x2000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* this is an updated version of a row */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED_OFF			0x4000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* moved elsewhere by pre-9.0 VACUUM FULL; kept for binary upgrade compatibility */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED_IN			0x8000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* moved from elsewhere, opposite of HEAP_MOVED_OFF; kept for compatibility */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XACT_MASK			0xFFF0	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* visibility-related bits */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask2 Flag Meanings
 &lt;div id="infomask2-flag-meanings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask2-flag-meanings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_NATTS_MASK			0x07FF	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* 11 bits for the number of columns (MaxHeapAttributeNumber is 1600) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* bits 0x1800 are available */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_KEYS_UPDATED		0x2000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple updated or deleted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HOT_UPDATED		0x4000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple updated, new tuple is HOT */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_ONLY_TUPLE			0x8000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* HOT tuple */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP2_XACT_MASK			0xE000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* visibility-related bits */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_TUPLE_HAS_MATCH	HEAP_ONLY_TUPLE 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* flag temporarily used in Hash Join, only for Hash table tuples that don&amp;#39;t need visibility info; we can reuse a visibility flag instead of a separate bit */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask Bit Calculation
 &lt;div id="infomask-bit-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-bit-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Converting hex to binary makes it easier to understand the &lt;strong&gt;bit&lt;/strong&gt; meanings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- convert hex 1600 to binary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; x&lt;span style="color:#e6db74"&gt;&amp;#39;1600&amp;#39;&lt;/span&gt;::bit(&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bit 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0001011000000000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;infomask:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0001&lt;/span&gt; HEAP_HASNULL			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0002&lt;/span&gt; HEAP_HASVARWIDTH		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0004&lt;/span&gt; HEAP_HASEXTERNAL		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000001000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0008&lt;/span&gt; HEAP_HASOID_OLD			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0010&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000100000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0020&lt;/span&gt; HEAP_COMBOCID
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0040&lt;/span&gt; HEAP_XMAX_EXCL_LOCK
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000010000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0080&lt;/span&gt; HEAP_XMAX_LOCK_ONLY		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0050&lt;/span&gt; HEAP_XMAX_SHR_LOCK bitwise OR: (HEAP_XMAX_EXCL_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0050&lt;/span&gt; HEAP_LOCK_MASK bitwise OR: (HEAP_XMAX_SHR_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_EXCL_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000100000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0100&lt;/span&gt; HEAP_XMIN_COMMITTED		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000001000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0200&lt;/span&gt; HEAP_XMIN_INVALID		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000001100000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0300&lt;/span&gt; HEAP_XMIN_FROZEN bitwise OR: (HEAP_XMIN_COMMITTED&lt;span style="color:#f92672"&gt;|&lt;/span&gt;HEAP_XMIN_INVALID)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000010000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0400&lt;/span&gt; HEAP_XMAX_COMMITTED		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000100000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0800&lt;/span&gt; HEAP_XMAX_INVALID		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0001000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x1000&lt;/span&gt; HEAP_XMAX_IS_MULTI		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0010000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x2000&lt;/span&gt; HEAP_UPDATED			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x4000&lt;/span&gt; HEAP_MOVED_OFF			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1000000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x8000&lt;/span&gt; HEAP_MOVED_IN			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xC000&lt;/span&gt; HEAP_MOVED bitwise OR: (HEAP_MOVED_OFF &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_MOVED_IN)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8000&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1111111111110000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xFFF0&lt;/span&gt; HEAP_XACT_MASK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;infomask2:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000011111111111&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x07FF&lt;/span&gt; HEAP_NATTS_MASK PostgreSQL max columns is &lt;span style="color:#ae81ff"&gt;1600&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000011001000000&lt;/span&gt;, so &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; bits suffice &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; column count
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0001100000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x1800&lt;/span&gt; available bits, apparently unused
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0010000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x2000&lt;/span&gt; HEAP_KEYS_UPDATED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x4000&lt;/span&gt; HEAP_HOT_UPDATED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1000000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x8000&lt;/span&gt; HEAP_ONLY_TUPLE 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1110000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xE000&lt;/span&gt; HEAP2_XACT_MASK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;How to Compute Infomask?
 &lt;div id="how-to-compute-infomask" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-compute-infomask" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Infomask flags are hexadecimal. pageinspect returns them as decimal. Use &lt;code&gt;to_hex()&lt;/code&gt; to convert from decimal to hexadecimal:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid,to_hex(t_infomask) infomask,to_hex(t_infomask2) infomask2 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; infomask &lt;span style="color:#f92672"&gt;|&lt;/span&gt; infomask2 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;b00 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;infomask=2b00&lt;/code&gt; — still a bit opaque. Convert to binary and match against the flag meanings: &lt;code&gt;0010101100000000 = HEAP_UPDATED + HEAP_XMAX_INVALID + HEAP_XMIN_FROZEN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Meaning: the tuple was updated, xmax is invalid (0), xmin is frozen (visible to all transactions).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;infomask2=1&lt;/code&gt; — the first 11 bits of binary (first 2047 in decimal, for up to 1600 columns) represent the number of user columns. So 1 means the tuple has only 1 column.&lt;/p&gt;
&lt;p&gt;Manually computing infomask is tedious. Starting from pg13, pageinspect provides the &lt;code&gt;heap_tuple_infomask_flags&lt;/code&gt; function to decode infomask and infomask2. Individual bits are shown as &lt;code&gt;raw_flags&lt;/code&gt;; combined multi-bit flags are shown as &lt;code&gt;combined_flags&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------------------------------------------------------------------+--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_FROZEN&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Commit Log (CLOG)
 &lt;div id="commit-log-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#commit-log-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL uses the commit log (CLOG) to store transaction status. PostgreSQL writes the transaction to WAL before completion — that&amp;rsquo;s what WAL means. If a transaction aborts, its status is written to both WAL and CLOG so that during instance recovery, PostgreSQL knows the transaction was not committed.&lt;/p&gt;
&lt;p&gt;When transaction status is needed — for example, when determining visibility — PostgreSQL reads the CLOG.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction status&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/clog.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_IN_PROGRESS		0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_COMMITTED		0x01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_ABORTED			0x02
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_SUB_COMMITTED	 0x03&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The CLOG defines four transaction states: &lt;code&gt;IN_PROGRESS&lt;/code&gt;, &lt;code&gt;COMMITTED&lt;/code&gt;, &lt;code&gt;ABORTED&lt;/code&gt;, &lt;code&gt;SUB_COMMITTED&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction status size&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/clog.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* We need two bits per xact, so four xacts fit in a byte */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_BITS_PER_XACT	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_BYTE 4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACT_BITMASK	((1 &amp;lt;&amp;lt; CLOG_BITS_PER_XACT) - 1)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction status is very small — only 2 bits per transaction. One byte can store 4 transaction states. A standard page can hold &lt;code&gt;8K * 4 = 32,768&lt;/code&gt; transaction states.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG persistence&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When PostgreSQL shuts down or checkpoints, CLOG data is written to the &lt;code&gt;pg_clog&lt;/code&gt; directory. In version 10.0 and later, &lt;code&gt;pg_clog&lt;/code&gt; was renamed to &lt;code&gt;pg_xact&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; Mar &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; 23:33 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On disk, CLOG files are named 0000, 0001, etc. CLOG files are 256KB in size, while in-memory pages storing transaction states are 8KB. So the 0000 file&amp;rsquo;s size will always be a multiple of 8192. After 32 CLOG pages are written, the next page goes into the 0001 file. PostgreSQL reads transaction states from &lt;code&gt;pg_xact&lt;/code&gt; into memory at startup.&lt;/p&gt;
&lt;p&gt;During system operation, not all transaction states need to be retained in CLOG files forever, so VACUUM periodically deletes no-longer-needed CLOG files.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Hint Bits
 &lt;div id="hint-bits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hint-bits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;What Are Hint Bits?
 &lt;div id="what-are-hint-bits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-are-hint-bits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Hint bits mark whether the transaction that created or deleted a row has committed or aborted. Without hint bits, determining transaction visibility requires accessing on-disk &lt;code&gt;pg_clog&lt;/code&gt; or &lt;code&gt;pg_subtrans&lt;/code&gt; — a relatively expensive operation. If a tuple has hint bits set, you can determine the tuple&amp;rsquo;s state just by reading the page — no extra access needed.&lt;/p&gt;
&lt;p&gt;The source code uses &lt;code&gt;SetHintBits()&lt;/code&gt; to set hint bits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			InvalidTransactionId);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;SetHintBits&lt;/code&gt; only sets 2 bits in infomask, for 4 hint bit flags (these 2 bits also combine into &lt;code&gt;HEAP_XMIN_FROZEN&lt;/code&gt; — it&amp;rsquo;s clear that hint bits exist purely to mark transaction state):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_COMMITTED	0x0100	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting or updating transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_INVALID		0x0200	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting or updating transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_COMMITTED		0x0400	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting or updating transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_INVALID		0x0800	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting or updating transaction invalid or aborted */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Queries Can Cause Writes
 &lt;div id="queries-can-cause-writes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#queries-can-cause-writes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When a transaction starts, PostgreSQL DML transactions record the transaction ID and status (like t_xmin) in the tuple header. But when the transaction ends, nothing is done to the header. Instead, a subsequent DML, DQL, or VACUUM that scans the relevant tuple triggers &lt;code&gt;SetHintBits&lt;/code&gt; (this happens in &lt;code&gt;HeapTupleSatisfiesMVCC()&lt;/code&gt; when a new snapshot accesses data — we&amp;rsquo;ll cover visibility rules later).&lt;/p&gt;
&lt;p&gt;Before &lt;code&gt;SetHintBits&lt;/code&gt; is triggered, PostgreSQL looks up transaction status in the CLOG. After &lt;code&gt;SetHintBits&lt;/code&gt; is triggered, it reads the hint bits in the data page&amp;rsquo;s tuple header.&lt;/p&gt;
&lt;p&gt;For example, an INSERT statement:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1; &lt;span style="color:#75715e"&gt;-- just a single query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After one query, t_infomask changed — the tuple header changed.&lt;/p&gt;
&lt;p&gt;After INSERT, &lt;code&gt;SetHintBits&lt;/code&gt; only had &lt;code&gt;HEAP_XMAX_INVALID&lt;/code&gt;, because INSERT only updates xmin. Whether the transaction commits or aborts (exits or rolls back), xmax is unused and can be set to &lt;code&gt;HEAP_XMAX_INVALID&lt;/code&gt; along with the transaction.&lt;/p&gt;
&lt;p&gt;But the transaction may commit or abort (exit/rollback). Since transaction completion does not update the tuple, &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt; cannot be set upon completion. During visibility checking (&lt;code&gt;heapam_visibility.c&lt;/code&gt;), the visibility check updates the transaction state by calling &lt;code&gt;SetHintBits&lt;/code&gt; on t_infomask. Thus, the query updated &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hint bits advantage&lt;/strong&gt;: completing (or failing) data modifications in a transaction produces no writes to the tuple. Commit and rollback are very fast.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hint bits disadvantage&lt;/strong&gt;: if a transaction updates many rows, the next query performing visibility checks may need to read transaction states from pg_clog and update many pages.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Do Hint Bits Generate WAL?
 &lt;div id="do-hint-bits-generate-wal" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#do-hint-bits-generate-wal" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When checksums are enabled or &lt;code&gt;wal_log_hints&lt;/code&gt; is true, if the first operation to make a page dirty after a checkpoint is updating hint bits, a WAL record is generated — specifically, a Full Page Image — to prevent partial writes that would cause checksum mismatches.&lt;/p&gt;
&lt;p&gt;Therefore, with checksums enabled or &lt;code&gt;wal_log_hints&lt;/code&gt; set to true, even a SELECT can modify page hint bits, which may generate WAL — increasing WAL storage to some extent. If you observe SELECT triggering disk writes, check whether CHECKSUM or &lt;code&gt;wal_log_hints&lt;/code&gt; is enabled.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Why Are Hint Bits Deferred?
 &lt;div id="why-are-hint-bits-deferred" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-are-hint-bits-deferred" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In &lt;code&gt;src/backend/access/heap/heapam_visibility.c&lt;/code&gt;, within the &lt;code&gt;HeapTupleSatisfiesMVCC()&lt;/code&gt; visibility function, a comment explains why hint bits are deferred:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*While insert/delete operations are still running, hint bits on tuples are not updated,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*even if the transaction has committed or aborted.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*In high-concurrency scenarios, sharing data structures can cause contention,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*and this doesn&amp;#39;t affect visibility decisions anyway.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*Hint bits are only set the first time a fresh snapshot accesses data after transaction completion.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*So HeapTupleSatisfiesMVCC always runs TransactionIdIsCurrentTransactionId and XidInMVCCSnapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*to determine whether the tuple belongs to the current transaction.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*In older versions, PostgreSQL tried to update hint bits immediately (even while transactions were running),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*but this caused more contention on the PGXACT array.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*/&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply put: immediate hint bit updates perform very poorly. So transaction status is first stored in CLOG to reduce PGXACT contention and improve performance. Deferred hint bits are why later queries may update tuple headers.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Tuple DML Operations
 &lt;div id="tuple-dml-operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tuple-dml-operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve built up knowledge of tuple headers, system columns, CLOG, and hint bits, let&amp;rsquo;s see how PostgreSQL performs INSERT, UPDATE, and DELETE.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Observing DML Transactions
 &lt;div id="observing-dml-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-dml-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ll observe PostgreSQL&amp;rsquo;s DML transaction behavior by examining tuple header fields: lp, lp_flags, ctid, xmin, xmax, cid (cmin, cmax), infomask, and infomask2.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll use the following query:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(A side note: some sources like to write &lt;code&gt;SELECT '(0,'||lp||')' AS ctid&lt;/code&gt;. This is misleading — lp and ctid are different things. lp is like a row number; ctid points to a line pointer number. lp can be different from ctid.)&lt;/p&gt;
&lt;p&gt;For readability, create a view:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;view&lt;/span&gt; vlzl1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now the query looks like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Expanded display &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;--+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;653&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;INSERT
 &lt;div id="insert" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#insert" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Truncate the table, then insert a row:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;ctid points to (page 0, lp 1), i.e., to itself.&lt;/li&gt;
&lt;li&gt;lp (line pointer number) increments.&lt;/li&gt;
&lt;li&gt;Both tuples share the same xmin — they were inserted by the same transaction.&lt;/li&gt;
&lt;li&gt;xmax is 0 (invalid transaction ID). Infomask only indicates xmax is invalid: this tuple has not yet &amp;ldquo;experienced&amp;rdquo; a delete transaction.&lt;/li&gt;
&lt;li&gt;cid increments from 0: 0 for the first command, 1 for the second.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;DELETE
 &lt;div id="delete" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delete" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;665&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first tuple was deleted. The tuple wasn&amp;rsquo;t physically removed — only a few attributes were marked:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ctid unchanged, still points to itself.&lt;/li&gt;
&lt;li&gt;xmax updated to the delete transaction ID.&lt;/li&gt;
&lt;li&gt;Infomask shows &lt;code&gt;HEAP_KEYS_UPDATED&lt;/code&gt;, indicating the tuple was deleted (actually, &lt;code&gt;HEAP_KEYS_UPDATED&lt;/code&gt; means either deleted or updated).&lt;/li&gt;
&lt;li&gt;Although only the first tuple was modified, the second tuple&amp;rsquo;s infomask was also updated with &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;UPDATE
 &lt;div id="update" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#update" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-------------------------------------------------------------+----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;665&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;666&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;666&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;An UPDATE doesn&amp;rsquo;t modify the tuple in place. Instead, it marks the old tuple as unavailable and inserts a new one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lp=2 is the old tuple from the update transaction. t_xmax is the update transaction ID. Infomask adds &lt;code&gt;HEAP_HOT_UPDATED&lt;/code&gt;, indicating the tuple is HOT. ctid points to the new tuple.&lt;/li&gt;
&lt;li&gt;lp=3 is the new tuple from the update. It&amp;rsquo;s equivalent to an inserted tuple, but xmin matches the old tuple&amp;rsquo;s xmax. Infomask has the extra flag &lt;code&gt;HEAP_UPDATED&lt;/code&gt;, indicating this is the updated version.&lt;/li&gt;
&lt;li&gt;Additionally, the invisible deleted tuple at lp=1 had its infomask updated with &lt;code&gt;HEAP_XMAX_COMMITTED&lt;/code&gt; by an unrelated subsequent update transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Rollback
 &lt;div id="rollback" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rollback" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- INSERT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;679&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- INSERT rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;679&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After INSERT and rollback, the tuple header shows no changes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 ; &lt;span style="color:#75715e"&gt;-- DELETE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;686&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- DELETE rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;686&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After DELETE and rollback, the tuple header shows no changes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; ; &lt;span style="color:#75715e"&gt;-- UPDATE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- UPDATE rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After UPDATE and rollback, the tuple header shows no changes.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;When a transaction rolls back, tuple information does not change at all. This is why PostgreSQL&amp;rsquo;s MVCC doesn&amp;rsquo;t worry about running out of rollback segments — rollback is purely a visibility operation, not a data update.&lt;/li&gt;
&lt;li&gt;xmax doesn&amp;rsquo;t change after rollback either, which means a non-zero xmax doesn&amp;rsquo;t necessarily indicate the tuple was deleted — the delete or update transaction may have rolled back.&lt;/li&gt;
&lt;li&gt;However, once visibility checking occurs, even without data changes, all tuples&amp;rsquo; infomask will be updated with &lt;code&gt;HEAP_XMIN_INVALID&lt;/code&gt;. Non-HOT tuples get &lt;code&gt;HEAP_XMIN_INVALID&lt;/code&gt;, and HOT-referenced tuples naturally get it too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;References for Tuple and Transaction
 &lt;div id="references-for-tuple-and-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references-for-tuple-and-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Snapshots in PostgreSQL
 &lt;div id="snapshots-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshots-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A snapshot is a data structure that records the instantaneous state of the database. PostgreSQL&amp;rsquo;s snapshot stores: the minimum and maximum transaction IDs among all active transactions, the list of currently active transactions, the current transaction&amp;rsquo;s command ID, and more.&lt;/p&gt;
&lt;p&gt;Snapshot data is stored in the &lt;code&gt;SnapshotData&lt;/code&gt; struct type. Source: &lt;code&gt;src/include/utils/snapshot.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; SnapshotData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SnapshotType snapshot_type; &lt;span style="color:#75715e"&gt;/* snapshot type */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId xmin;			&lt;span style="color:#75715e"&gt;/* txid &amp;lt; xmin are visible to the snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId xmax;			&lt;span style="color:#75715e"&gt;/* txid &amp;gt;= xmax are invisible to the snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* list of active transactions at snapshot time. Only includes txids between xmin and xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;xip;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uint32		xcnt;			&lt;span style="color:#75715e"&gt;/* xip_list stored in xip[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* list of active subtransactions at snapshot time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;subxip;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;int32		subxcnt;		&lt;span style="color:#75715e"&gt;/* subtransactions stored in subxip[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		suboverflowed;	&lt;span style="color:#75715e"&gt;/* whether subtransactions overflowed; overflows occur with many subtransactions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		takenDuringRecovery;	&lt;span style="color:#75715e"&gt;/* is this a recovery snapshot? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		copied;			&lt;span style="color:#75715e"&gt;/* whether the snapshot is a copy (RR and serializable copy their snapshots); false if static */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommandId	curcid;			&lt;span style="color:#75715e"&gt;/* command ID in the transaction; CID &amp;lt; curcid is visible */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TimestampTz whenTaken;		&lt;span style="color:#75715e"&gt;/* timestamp when snapshot was taken */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr	lsn;			&lt;span style="color:#75715e"&gt;/* LSN when snapshot was taken */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SnapshotData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; SnapshotData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;Snapshot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most important snapshot information is &lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, and &lt;code&gt;xip_list&lt;/code&gt;. Use &lt;code&gt;pg_current_snapshot()&lt;/code&gt; (in pg12 and earlier, &lt;code&gt;txid_current_snapshot()&lt;/code&gt;) to display the current transaction&amp;rsquo;s snapshot.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note: snapshot xmin/xmax are different from tuple xmin/xmax — they have different meanings.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_current_snapshot();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_current_snapshot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;104&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;102&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;xmin&lt;/th&gt;
 &lt;th&gt;Earliest active txid. All txids older than xmin have either committed (visible) or aborted (dead tuples).&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;xmax&lt;/td&gt;
 &lt;td&gt;First unassigned txid. xmax = latestCompletedXid + 1. All txid &amp;gt;= xmax have not yet started and are invisible to the current snapshot.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;xip_list&lt;/td&gt;
 &lt;td&gt;Stored in array xip[]. Since transactions can start and finish out of order (a later-started transaction may finish earlier), xmin and xmax alone cannot fully express all active transactions at snapshot time. xip_list stores the active transactions at snapshot time.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7605604abbc.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Snapshot Types
 &lt;div id="snapshot-types" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-types" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Beyond MVCC snapshots, PostgreSQL defines several other snapshot types in &lt;code&gt;src/include/utils/snapshot.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; SnapshotType
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Tuple is visible if and only if it satisfies MVCC snapshot visibility rules.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The most important snapshot type — used to implement MVCC.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Tuple visibility is judged based on snapshot xmin, xmax, xip_list, curcid, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If a command changed data, the current MVCC snapshot won&amp;#39;t see it; a new MVCC snapshot is needed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_MVCC &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Tuple is visible if its transaction committed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In-progress transactions are invisible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Data changes from the current command are visible to the SELF snapshot.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_SELF,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Any tuple is visible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_ANY,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Visible if the TOAST tuple is valid. TOAST visibility depends on the main table tuple&amp;#39;s visibility.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_TOAST,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Data changes from the current command are visible to the DIRTY snapshot.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The DIRTY snapshot preserves version info for in-progress tuples.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Snapshot xmin is set to the xmin of other in-progress transactions&amp;#39; tuples; xmax is similar.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_DIRTY,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* HISTORIC_MVCC snapshot follows MVCC rules, used for logical decoding.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_HISTORIC_MVCC,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; Determines whether dead tuples are visible to certain transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_NON_VACUUMABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SnapshotType;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Snapshots and Isolation Levels
 &lt;div id="snapshots-and-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshots-and-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Different isolation levels acquire snapshots differently:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ab95b43529f1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Read Committed requires a new snapshot for each SQL statement in the transaction, while Repeatable Read uses only one snapshot for the entire transaction. The function that acquires snapshots is &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Process-Level Transaction Structures
 &lt;div id="process-level-transaction-structures" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-level-transaction-structures" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When PostgreSQL acquires snapshot data, it needs to scan the transaction state of all backend processes.&lt;/p&gt;
&lt;p&gt;Before understanding the &lt;code&gt;GetSnapshotData()&lt;/code&gt; function, we need to understand several backend process structures: PGPROC, PGXACT, PROC_HDR (PROCGLOBAL), and ProcArray.&lt;/p&gt;
&lt;p&gt;These process-related structures contain process and lock information. Here we only study the transaction-related parts. Source code examples are based on pg13.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PGPROC Struct
 &lt;div id="pgproc-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pgproc-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Every backend process stores a PGPROC struct in memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Think of this as the backend process&amp;#39;s main structure.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PGPROC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LocalTransactionId lxid;	&lt;span style="color:#75715e"&gt;/* local id of top-level transaction currently
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * being executed by this proc, if running;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * else InvalidLocalTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; XidCache subxids;	&lt;span style="color:#75715e"&gt;/* cached subtransaction XIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* clog group transaction status update */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		clogGroupMember;	&lt;span style="color:#75715e"&gt;/* whether this proc uses clog group commit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_atomic_uint32 clogGroupNext; &lt;span style="color:#75715e"&gt;/* atomic int, pointing to the next group member proc */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId clogGroupMemberXid;	&lt;span style="color:#75715e"&gt;/* xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XidStatus	clogGroupMemberXidStatus;	&lt;span style="color:#75715e"&gt;/* status of the xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			clogGroupMemberPage;	&lt;span style="color:#75715e"&gt;/* which page the xid to be committed belongs to */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr	clogGroupMemberLsn; &lt;span style="color:#75715e"&gt;/* LSN of the commit log for the xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* NOTE: &amp;#34;typedef struct PGPROC PGPROC&amp;#34; appears in storage/lock.h. Not written with the struct itself. */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;PGXACT Struct
 &lt;div id="pgxact-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pgxact-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Before 9.2, PGXACT information was inside PGPROC. Stress testing showed that on multi-CPU systems,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// separating them makes GetSnapshotData faster by reducing the number of cache lines fetched.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xid;			&lt;span style="color:#75715e"&gt;/* id of top-level transaction currently being
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * executed by this proc, if running and XID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * is assigned; else InvalidTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#75715e"&gt;// appears to be the current process&amp;#39;s xmax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmin;			&lt;span style="color:#75715e"&gt;/* excluding lazy vacuum; minimum xid at transaction start;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 vacuum cannot remove tuples with xid &amp;gt;= xmin */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		vacuumFlags;	&lt;span style="color:#75715e"&gt;/* vacuum-related flags, see above */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		overflowed; &lt;span style="color:#75715e"&gt;// whether PGXACT overflowed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PGXACT;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PGXACT stores relatively simple information — the backend&amp;rsquo;s xmin, xmax, and other transaction-related fields. &lt;strong&gt;PGPROC leans toward storing basic backend info; some less frequently accessed transaction info remains in PGPROC, but the core process transaction info is in PGXACT.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;PROC_HDR (PROCGLOBAL) Struct
 &lt;div id="proc_hdr-procglobal-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#proc_hdr-procglobal-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Every backend process has a proc struct. In high-concurrency scenarios, scanning all proc structs to find transaction info is time-consuming. An instance-level structure is needed to store all proc info — this is PROCGLOBAL.&lt;/p&gt;
&lt;p&gt;The source typically uses the struct type &lt;code&gt;PROC_HDR&lt;/code&gt; to define a struct pointer to PROCGLOBAL. PROC_HDR stores global proc info: the full array of proc structs, free procs, etc.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PROC_HDR
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgproc array (not including dummies for prepared txns) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;allProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgxact array (not including dummies for prepared txns) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;allPgXact;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Current shared estimate of appropriate spins_per_delay value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			spins_per_delay;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* The proc of the Startup process, since not in ProcArray */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;startupProc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			startupProcPid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Buffer id of the buffer that Startup process waits for pin on, or -1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			startupBufferPinWaitBufId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PROC_HDR;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;ProcArray Struct
 &lt;div id="procarray-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procarray-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;ProcArray is in &lt;code&gt;procarray.c&lt;/code&gt;, which maintains the PGPROC and PGXACT structures for all backends.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/storage/ipc/procarray.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; ProcArrayStruct
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numProcs;		&lt;span style="color:#75715e"&gt;/* number of procs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			maxProcs;		&lt;span style="color:#75715e"&gt;/* size of proc array */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// handling assigned xids
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			maxKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* allocated size of array */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* current # of valid entries */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			tailKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* index of oldest valid element */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			headKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* index of newest element, + 1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;slock_t&lt;/span&gt;		known_assigned_xids_lck;	&lt;span style="color:#75715e"&gt;/* protects head/tail pointers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Highest subxid that has been removed from KnownAssignedXids array to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * prevent overflow; or InvalidTransactionId if none. We track this for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * similar reasons to tracking overflowing cached subxids in PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * entries. Must hold exclusive ProcArrayLock to change this, and shared
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * lock to read it.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId lastOverflowedXid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* oldest xmin of any replication slot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* oldest catalog xmin of any replication slot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgprocnos, equivalent to allPgXact[] array indices, used to look up allPgXact[]; this array has PROCARRAY_MAXPROCS entries */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} ProcArrayStruct;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; ProcArrayStruct &lt;span style="color:#f92672"&gt;*&lt;/span&gt;procArray;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Acquiring a Snapshot
 &lt;div id="acquiring-a-snapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#acquiring-a-snapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;GetTransactionSnapshot()
 &lt;div id="gettransactionsnapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gettransactionsnapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Snapshots are acquired via &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/utils/time/snapmgr.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// GetTransactionSnapshot() allocates the appropriate snapshot for SQL in a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetTransactionSnapshot&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#75715e"&gt;// Return historic snapshot if doing logical decoding. We&amp;#39;ll never need a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#75715e"&gt;// non-historic snapshot after this, so return directly.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HistoricSnapshotActive&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FirstSnapshotSet);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; HistoricSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* If it&amp;#39;s not the first call in this transaction, enter this if */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FirstSnapshotSet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Ensure the catalog snapshot is fresh.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;InvalidateCatalogSnapshot&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;pairingheap_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;RegisteredSnapshots));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(FirstXactSnapshot &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Return error if in parallel mode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsInParallelMode&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#e6db74"&gt;&amp;#34;cannot take query snapshot during a parallel operation&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;// For Repeatable Read or Serializable, use the same snapshot for the entire transaction; only copy once
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;// IsolationUsesXactSnapshot() means the isolation level is RR or Serializable — they use one snapshot per transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationUsesXactSnapshot&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// First, create the snapshot in CurrentSnapshotData
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If SI isolation level, initialize SSI-required data structures
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationIsSerializable&lt;/span&gt;()) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSerializableTransactionSnapshot&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Make a saved copy */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* For Repeatable Read or Serializable, this snapshot lasts the entire transaction; copy once */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CopySnapshot&lt;/span&gt;(CurrentSnapshot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FirstXactSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Mark it as &amp;#34;registered&amp;#34; in FirstXactSnapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FirstXactSnapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;regd_count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;pairingheap_add&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;RegisteredSnapshots, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;FirstXactSnapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ph_node);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// For Read Committed, acquire a snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Modify flag to indicate this is the first snapshot; subsequent calls in this transaction won&amp;#39;t enter this if
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		FirstSnapshotSet &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// If not the first call in this transaction (already have a first snapshot)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// For Repeatable Read or Serializable, return a copy of the first snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationUsesXactSnapshot&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Don&amp;#39;t allow catalog snapshot to be older than xact snapshot. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;InvalidateCatalogSnapshot&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Read Committed: re-acquire snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;About &lt;code&gt;IsolationUsesXactSnapshot()&lt;/code&gt; and &lt;code&gt;IsolationIsSerializable()&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Defined as macros in &lt;code&gt;src/include/access/xact.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_READ_UNCOMMITTED	0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_READ_COMMITTED	1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_REPEATABLE_READ	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_SERIALIZABLE	3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Internally only 3 isolation levels: 1, 2, 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// 2 isolation levels use one snapshot per transaction; others use one snapshot per SQL statement
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IsolationUsesXactSnapshot() (XactIsoLevel &amp;gt;= XACT_REPEATABLE_READ)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IsolationIsSerializable() (XactIsoLevel == XACT_SERIALIZABLE)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;IsolationUsesXactSnapshot()&lt;/code&gt; is true for Repeatable Read or Serializable.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;IsolationIsSerializable()&lt;/code&gt; is true for Serializable only.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;GetTransactionSnapshot()&lt;/code&gt; flow chart:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/578be2dea323.png" alt="image" /&gt;
(image from CSDN: &lt;a href="https://blog.csdn.net/Hehuyi_In" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The main logic of &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For historic snapshots during logical decoding, return the snapshot result directly.&lt;/li&gt;
&lt;li&gt;For Repeatable Read or Serializable: on the first call, return the snapshot and copy it so subsequent calls (non-first) can directly reference it.&lt;/li&gt;
&lt;li&gt;For Read Committed: generate a new snapshot on every call.&lt;/li&gt;
&lt;li&gt;For the first call in Serializable, additionally acquire SSI data information.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GetTransactionSnapshot()&lt;/code&gt; acquires the snapshot; the actual data comes from &lt;code&gt;GetSnapshotData()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;GetSnapshotData()
 &lt;div id="getsnapshotdata" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#getsnapshotdata" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/storage/ipc/procarray.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(Snapshot snapshot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize some variables: arrayP pointer, procarray, xmin, xmax, replication slot txid, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ProcArrayStruct &lt;span style="color:#f92672"&gt;*&lt;/span&gt;arrayP &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId globalxmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			index;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			subcount &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidTransactionId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_catalog_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidTransactionId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(snapshot &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * First call for this snapshot. Snapshot is same size whether or not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * we are in recovery, see later comments.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// get current transaction&amp;#39;s xip
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;GetMaxSnapshotXidCount&lt;/span&gt;() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// get current subtransaction&amp;#39;s subxip
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;GetMaxSnapshotSubxidCount&lt;/span&gt;() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Acquire procarray; need shared LWLock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(ProcArrayLock, LW_SHARED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax = max completed xid + 1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ShmemVariableCache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;latestCompletedXid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TransactionIdAdvance&lt;/span&gt;(xmax); &lt;span style="color:#75715e"&gt;// xmax + 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax value retrieved; xmin needs scanning pgproc, pgxact, procarray */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Set globalxmin and xmin to xmax first; if backends have no transaction info, this is simpler */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmax; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Recovery snapshots handled separately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;takenDuringRecovery &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Non-recovery snapshots need transaction info from backends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;takenDuringRecovery)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgprocnos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pgprocnos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Spin over procArray checking xid, xmin, and subxids. The goal is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * to gather all active xids, find the lowest xmin, and try to record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * subxids. It appears that while scanning procarray, it will spin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * to collect all active xids, the smallest xmin, and subtransaction subxids.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		numProcs &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;numProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (index &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; index &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; numProcs; index&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pgprocnos[index]; &lt;span style="color:#75715e"&gt;// iterate numProcs, get all pgprocno indices
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgxact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allPgXact[pgprocno]; &lt;span style="color:#75715e"&gt;// iterate all pgxact structs via pgprocno
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Update globalxmin to be the smallest valid xmin */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			xid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(xid) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(xid, globalxmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Fetch xid just once - see GetNewTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			xid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Save backend&amp;#39;s xmin into snapshot xip */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* i.e., iterate all pgxact to find all active xids */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip[count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Subtransaction info handling */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;suboverflowed) &lt;span style="color:#75715e"&gt;// if subtransaction hasn&amp;#39;t overflowed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;overflowed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// if transaction overflowed, mark subtransaction as overflowed too
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			nxids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (nxids &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allProcs[pgprocno];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;pg_read_barrier&lt;/span&gt;();	&lt;span style="color:#75715e"&gt;/* pairs with GetNewTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;+&lt;/span&gt; subcount,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 (&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxids.xids,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 nxids &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						subcount &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// the else corresponds to if (!snapshot-&amp;gt;takenDuringRecovery)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// These checks are for standby; when the instance is in hot standby mode and queries run on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		subcount &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;KnownAssignedXidsGetAndSetXmin&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xmin,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 xmax);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedesOrEquals&lt;/span&gt;(xmin, procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lastOverflowedXid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Replication slot xmin and catalog cluster-wide xmin, first save to local variables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Replication slot xmin prevents tuple reclamation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// The comment says this is to avoid holding ProcArrayLock for too long, so save to local variables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	replication_slot_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	replication_slot_catalog_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Backend transaction info gathering is done; below is a series of ifs for cleanup and code robustness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(MyPgXact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		MyPgXact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; TransactionXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(ProcArrayLock); &lt;span style="color:#75715e"&gt;// release ProcArrayLock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;(xmin, globalxmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin; &lt;span style="color:#75715e"&gt;// globalxmin and process xmin: assign globalxmin to the smaller one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; globalxmin &lt;span style="color:#f92672"&gt;-&lt;/span&gt; vacuum_defer_cleanup_age;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; FirstNormalTransactionId; &lt;span style="color:#75715e"&gt;// edge case: if RecentGlobalXmin &amp;lt;= 2, assign 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Check whether there&amp;#39;s a replication slot requiring an older xmin. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(replication_slot_xmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(replication_slot_xmin, RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Non-catalog tables can be vacuumed if older than this xid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentGlobalDataXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RecentGlobalXmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Re-check and compare catalog, globalxmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(replication_slot_catalog_xmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(replication_slot_catalog_xmin, RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Start assigning values to the snapshot struct, returning snapshot data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xcnt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; count;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxcnt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; subcount;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; suboverflowed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetCurrentCommandId&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If it&amp;#39;s a new snapshot, initialize some snapshot info
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;active_count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;regd_count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;copied &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Snapshot-too-old logic below; oddly written here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (old_snapshot_threshold &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If not using &amp;#34;snapshot too old&amp;#34; feature, fill related fields with
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * dummy values that don&amp;#39;t require any locking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When old_snapshot_threshold &amp;lt; 0 (no &amp;#34;snapshot too old&amp;#34; issue)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// assign simple constant values that won&amp;#39;t require any locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lsn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidXLogRecPtr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When old_snapshot_threshold &amp;gt;= 0, need to handle old snapshot logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lsn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetXLogInsertRecPtr&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// get LSN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotCurrentTimestamp&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// get snapshot timestamp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;MaintainOldSnapshotTimeMapping&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken, xmin); &lt;span style="color:#75715e"&gt;//
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// GetXLogInsertRecPtr(), GetSnapshotCurrentTimestamp(), MaintainOldSnapshotTimeMapping() 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// all contain SpinLockAcquire and SpinLockRelease
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// MaintainOldSnapshotTimeMapping() also has LWLockAcquire and LWLockRelease
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Since this is called for every snapshot, GetSnapshotData should be very frequent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// So in pg13 source, setting old_snapshot_threshold to negative avoids many spinlocks and lwlocks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; snapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pg14 Snapshot Optimizations
 &lt;div id="pg14-snapshot-optimizations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg14-snapshot-optimizations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;pg14 Optimization Source Analysis
 &lt;div id="pg14-optimization-source-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg14-optimization-source-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From the pg13 source, we can see that &lt;code&gt;GetSnapshotData()&lt;/code&gt; hardcodes &lt;code&gt;old_snapshot_threshold &amp;gt;= 0&lt;/code&gt;, causing each snapshot acquisition to incur many &lt;code&gt;SpinLock&lt;/code&gt; and &lt;code&gt;LWLock&lt;/code&gt; operations. Since snapshot acquisition is extremely frequent, this inevitably causes performance issues. So pg14 simply removed the &lt;code&gt;old_snapshot_threshold&lt;/code&gt; logic from &lt;code&gt;GetSnapshotData()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Beyond that removal, pg14 made many other optimizations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Removed &lt;code&gt;RecentGlobalXmin&lt;/code&gt; and &lt;code&gt;RecentGlobalDataXmin&lt;/code&gt;, added the &lt;code&gt;GlobalVisTest*&lt;/code&gt; family of functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Introduced the &lt;strong&gt;boundaries&lt;/strong&gt; concept with two boundaries: &lt;code&gt;definitely_needed&lt;/code&gt; and &lt;code&gt;maybe_needed&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; GlobalVisState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* XIDs &amp;gt;= are considered running by some backend */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// rows with XID &amp;gt;= definitely_needed are definitely visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FullTransactionId definitely_needed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* XIDs &amp;lt; are not considered to be running by any backend */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// rows with XID &amp;lt; maybe_needed can definitely be cleaned up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FullTransactionId maybe_needed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added &lt;code&gt;ComputeXidHorizons()&lt;/code&gt; for more precise horizon calculation (storing xmin and removable xid information). This function still needs to iterate PGPROC. The calculation range is &lt;code&gt;XID &amp;gt;= maybe_needed &amp;amp;&amp;amp; XID &amp;lt; definitely_needed&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added &lt;code&gt;GlobalVisTestShouldUpdate()&lt;/code&gt; to determine whether boundaries need recalculation.&lt;/p&gt;
&lt;p&gt;First, understand the variable &lt;code&gt;ComputeXidHorizonsResultLastXmin&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; TransactionId ComputeXidHorizonsResultLastXmin; &lt;span style="color:#75715e"&gt;// last precisely computed xmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GlobalVisTestShouldUpdate&lt;/span&gt;(GlobalVisState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If xmin=0, need to recalculate boundaries. This is an edge case for tuples created during database initialization.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(ComputeXidHorizonsResultLastXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If the maybe_needed/definitely_needed boundaries are the same, it&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * unlikely to be beneficial to refresh boundaries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When maybe_needed equals definitely_needed, no need to recalculate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Uses FullTransactionIdFollowsOrEquals (not strict equality)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// &amp;#34;Greater than&amp;#34; scenario: no rows definitely visible. &amp;#34;Equal&amp;#34; scenario: only one row definitely visible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;FullTransactionIdFollowsOrEquals&lt;/span&gt;(state&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;maybe_needed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 state&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;definitely_needed))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* does the last snapshot built have a different xmin? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When the last snapshot&amp;#39;s xmin equals the last precisely computed xmin, no need to recalculate boundaries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; RecentXmin &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; ComputeXidHorizonsResultLastXmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can see that &lt;code&gt;maybe_needed&lt;/code&gt; and &lt;code&gt;definitely_needed&lt;/code&gt; are similar to snapshot xmin/xmax, but with an additional layer of computation. First calculate boundaries, then further refine with &lt;code&gt;ComputeXidHorizons()&lt;/code&gt;. &lt;code&gt;GlobalVisTestShouldUpdate&lt;/code&gt; reduces the scenarios where boundaries need recalculation, and &lt;code&gt;ComputeXidHorizons()&lt;/code&gt; is also more efficient for precise calculation.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Optimization Results
 &lt;div id="optimization-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#optimization-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Recommended article on PostgreSQL snapshot optimization:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The before-and-after comparison is striking:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a346095be5a7.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In pg13 production environments, &lt;code&gt;GetSnapshotData&lt;/code&gt; consistently shows high performance overhead. (No screenshot, so I&amp;rsquo;ll borrow another expert&amp;rsquo;s chart:)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8cd67db0e65f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Snapshot References
 &lt;div id="snapshot-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Visibility Checking
 &lt;div id="visibility-checking" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#visibility-checking" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;With a snapshot, we can determine tuple visibility. Let&amp;rsquo;s review the key information (ignoring subtransactions for now): tuple header transaction info, snapshot info, and CLOG transaction status (before SetHintBits).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tuple header has: xmin, xmax, cmin, cmax, infomask, etc.&lt;/li&gt;
&lt;li&gt;Snapshot data has: snapshot xmin, xmax, xip_list, curcid, etc.&lt;/li&gt;
&lt;li&gt;CLOG has additional transaction status info, which may also be written to infomask as hint bits.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Different snapshot types have slightly different visibility rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesVisibility&lt;/span&gt;(HeapTuple tup, Snapshot snapshot, Buffer buffer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;snapshot_type)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SNAPSHOT_MVCC:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesMVCC&lt;/span&gt;(tup, snapshot, buffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SNAPSHOT_NON_VACUUMABLE:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesNonVacuumable&lt;/span&gt;(tup, snapshot, buffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Each snapshot type has its own visibility rules. Here we&amp;rsquo;ll use the most common &lt;code&gt;SNAPSHOT_MVCC&lt;/code&gt; visibility rules to understand tuple visibility.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesMVCC&lt;/span&gt;(HeapTuple htup, Snapshot snapshot,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Buffer buffer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTupleHeader tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_data; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;ItemPointerIsValid&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self)); &lt;span style="color:#75715e"&gt;// lp valid, i.e., tuple valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_tableOid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidOid); &lt;span style="color:#75715e"&gt;// oid valid, i.e., table valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// t_xmin not committed: the transaction that INSERTed or UPDATEd this new tuple has not committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// In htup_details.h, macro: HeapTupleHeaderXminCommitted() is ((tup)-&amp;gt;t_infomask &amp;amp; HEAP_XMIN_COMMITTED) != 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// So if (!HeapTupleHeaderXminCommitted(tuple)) means the tuple infomask does not have HEAP_XMIN_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Literally: t_xmin has not committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminCommitted&lt;/span&gt;(tuple)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If a transaction updated the tuple but then aborted or failed, this tuple&amp;#39;s xmin is the failed transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the inserting transaction failed, directly return invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminInvalid&lt;/span&gt;(tuple))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When infomask has HEAP_MOVED_OFF, visibility is judged separately for VACUUM tuples, with hint bits set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Used by pre-9.0 binary upgrades */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_MOVED_OFF)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xvac &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetXvac&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xvac, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When infomask has HEAP_MOVED_IN, visibility is judged separately for VACUUM tuples, with hint bits set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Used by pre-9.0 binary upgrades */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_MOVED_IN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xvac &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetXvac&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xvac, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When the tuple was written by the current transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmin&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid) &lt;span style="color:#75715e"&gt;// tuple cid &amp;gt;= snapshot current command id
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;// tuple was inserted after visibility check started; invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_INVALID) &lt;span style="color:#75715e"&gt;// infomask has HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// tuple not deleted; visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// A pure insert, whether committed, not yet committed, or rolled back, has HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// But this check is under the &amp;#34;written by current transaction&amp;#34; condition, so:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// Tuple inserted by current transaction, not committed (logically equivalent to not deleted within the same tx),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// and t_cid &amp;lt; curcid → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// xmax is set in two scenarios: 1) tuple locked, 2) tuple deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Even without HEAP_XMAX_INVALID, the tuple may not be deleted — it may just be locked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Locked tuples have xmax set but are visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask))	&lt;span style="color:#75715e"&gt;/* not deleter */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// HEAP_XMAX_IS_MULTI is set when multiple transactions acquire locks on the same row, producing MultiXactId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Still judging visibility under xmax lock scenarios
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_IS_MULTI)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleGetUpdateXid&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* not LOCKED_ONLY, so it has to have an xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* updating subtransaction must have aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// If xmax is not the current transaction, visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// If xmax is the current transaction, judge by command id:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// snapshot acquired before update/delete → tuple was visible at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* updated after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* updated before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// The following scenario: a subtransaction&amp;#39;s delete command was rolled back, need SetHintBits HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Delete rolled back, so tuple is visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* deleting subtransaction must have aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax is the command ID that deleted the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If tuple cmax &amp;gt;= snapshot curcid: delete happened after snapshot scan → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If tuple cmax &amp;lt; snapshot curcid: delete happened before snapshot scan → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// XidInMVCCSnapshot() checks if xid was in-progress at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// &amp;#34;in-progress&amp;#34; means: 1. snapshot xmin &amp;lt;= xid &amp;lt; snapshot xmax AND xid in xip_list 2. xid &amp;gt;= snapshot xmax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// The xid below is t_xmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// So this means: if t_xmin was in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Equivalent to: t_xmin not committed → invisible. This seems redundant.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Because this whole block is under !HeapTupleHeaderXminCommitted(tuple) — also meaning t_xmin not committed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// But with the preceding checks, this else if is reasonable. Meaning:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// t_xmin not committed, tuple not deleted, not current transaction → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If t_xmin transaction committed, SetHintBits HEAP_XMIN_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// This seems odd: the entire block is for t_xmin NOT committed, how could it be committed here?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// And if this case really happens, why no visibility judgment?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If t_xmin transaction did not commit, SetHintBits HEAP_XMIN_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// t_xmin transaction not committed, return invisible again. Similar to XidInMVCCSnapshot() above?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Currently: not committed, and doesn&amp;#39;t satisfy XidInMVCCSnapshot() (xid was not in-progress at snapshot time)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// The only case: transaction hadn&amp;#39;t started at snapshot time, later started, still not committed → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// xmin-not-committed visibility judgments finally done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Everything after the else is for when xmin IS committed (hint bit HEAP_XMIN_COMMITTED is set)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// xmin is committed, but maybe not according to our snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmin is committed, but maybe not according to our snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If infomask has no HEAP_XMIN_FROZEN AND xmin was in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Translating the if: at snapshot time, xmin was not committed; at visibility check time,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// tuple xmin is committed but not marked FROZEN → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Even though tuple xmin is now committed, from the current snapshot&amp;#39;s perspective it was still in-progress
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminFrozen&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;		&lt;span style="color:#75715e"&gt;/* treat as still in progress */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// HEAP_XMAX_INVALID means tuple not deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This if means: tuple committed, and was committed at snapshot time, and not deleted (no delete marker at all) → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_INVALID)	&lt;span style="color:#75715e"&gt;/* xid invalid or aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple has xmax, but it&amp;#39;s not a delete — it&amp;#39;s a lock marker
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This if means: tuple committed, was committed at snapshot time, has xmax but xmax is a lock → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// HEAP_XMAX_IS_MULTI means the tuple is in shared-row-lock state, typically when multiple transactions process one row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_IS_MULTI)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* already checked above */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Get the transaction ID that updated the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleGetUpdateXid&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* not LOCKED_ONLY, so it has to have an xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple&amp;#39;s transaction ID is the current transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// tuple cmax &amp;gt;= snapshot curcid: tuple not yet deleted at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// tuple cmax &amp;lt; snapshot curcid: tuple already deleted at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple&amp;#39;s transaction ID is not the current transaction, and xmax was in-progress at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// This if means: xmin committed, tuple not deleted, MULTI XMAX marker present, xmax not yet committed at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xmax, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple transaction committed → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;		&lt;span style="color:#75715e"&gt;/* updating transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Updating transaction aborted or crashed → still visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple xmin committed, xmax not yet marked committed, not yet deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Seems !HEAP_XMAX_COMMITTED differs from HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This looks like: tuple experienced a delete, but the delete transaction hasn&amp;#39;t committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// While HEAP_XMAX_INVALID above is: definitely no delete or delete aborted/rolled back, so can directly return true
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_COMMITTED))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If xmax is the same as the checking transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Same old pattern: visibility via command id
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax &amp;gt;= snapshot curcid: delete happened after snapshot → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax &amp;lt; snapshot curcid: delete happened before snapshot → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Delete transaction not committed, and xmax not the checking transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If xmax was in-progress at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Confirm xmax delete transaction aborted or failed; SetHintBits HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Similar to HEAP_XMAX_INVALID above → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmax transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Remaining case: xmax delete transaction committed. SetHintBits HEAP_XMAX_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Visibility should be judged here, but it&amp;#39;s deferred to the last few lines, because this is a sub-case of a larger condition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmax is committed, but maybe not according to our snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// xmax delete transaction now committed, but was in-progress at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;		&lt;span style="color:#75715e"&gt;/* treat as still in progress */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Only remaining case: xmax committed and was not in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The entire visibility judgment source code looks complex. Stripping out the &lt;code&gt;SetHintBits&lt;/code&gt; parts and the convoluted if-else chains, focusing only on the core visibility rules, the key points are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Core visibility rule logic:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Delete committed → tuple invisible&lt;/li&gt;
&lt;li&gt;Insert committed, delete rolled back → tuple visible&lt;/li&gt;
&lt;li&gt;Insert committed, delete not committed → current transaction compares cid; other transactions see the tuple as visible&lt;/li&gt;
&lt;li&gt;Insert rolled back → tuple invisible&lt;/li&gt;
&lt;li&gt;Insert not committed → same transaction compares cmin; other transactions see the tuple as invisible&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Visibility checking involves two time points: the check time and the snapshot time. The logic distinguishes between the same transaction (checking transaction = snapshot transaction) and different transactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Same transaction&lt;/strong&gt;: compare tuple cmin/cmax against &lt;code&gt;snapshot-&amp;gt;curcid&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cmin &amp;gt;= snapshot-&amp;gt;curcid&lt;/code&gt;: tuple inserted after snapshot → invisible. Otherwise visible.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cmax &amp;gt;= snapshot-&amp;gt;curcid&lt;/code&gt;: tuple deleted after snapshot → visible. Otherwise invisible.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Different transactions&lt;/strong&gt;: use &lt;code&gt;XidInMVCCSnapshot()&lt;/code&gt; to check whether xid (t_xmin or t_xmax) was in-progress at snapshot time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;xmin was in-progress at snapshot time → invisible.&lt;/li&gt;
&lt;li&gt;xmax was in-progress at snapshot time → visible.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Beyond basic DML operations, there are 4 additional cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VACUUM tuple insert/delete visibility&lt;/li&gt;
&lt;li&gt;Lock-only marker (&lt;code&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/code&gt;): tuple visible&lt;/li&gt;
&lt;li&gt;MultiXact state (&lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt;): visibility for tuples under multi-transaction locks&lt;/li&gt;
&lt;li&gt;Frozen tuples: visibility when frozen marker is set&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;MultiXact
 &lt;div id="multixact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is MultiXact?
 &lt;div id="what-is-multixact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-multixact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When multiple transactions lock the same row, there may be multiple associated transaction IDs on the tuple. PostgreSQL groups multiple transaction IDs together and manages them with a single &lt;code&gt;MultiXactId&lt;/code&gt;. The relationship between TransactionId and MultiXactId is many-to-one.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ee67ad9bb95b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Like TransactionId, MultiXactId is also 32-bit and also subject to wraparound.&lt;/p&gt;
&lt;p&gt;MultiXactId values 0 and 1 are reserved for system use. Allocatable MultiXactIds start from 2.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Source: src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;include&lt;span style="color:#f92672"&gt;/&lt;/span&gt;access&lt;span style="color:#f92672"&gt;/&lt;/span&gt;multixact.h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define InvalidMultiXactId	((MultiXactId) 0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FirstMultiXactId	((MultiXactId) 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MaxMultiXactId		((MultiXactId) 0xFFFFFFFF)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Row Lock Types
 &lt;div id="row-lock-types" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#row-lock-types" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;MultiXact only exists when rows are locked. MultiXact defines 6 states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForKeyShare &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x00&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForShare &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x01&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForNoKeyUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x02&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x03&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* an update that doesn&amp;#39;t touch &amp;#34;key&amp;#34; columns */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusNoKeyUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x04&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* other updates, and delete */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x05&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} MultiXactStatus;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are 4 explicitly declarable row lock states: &lt;code&gt;ForKeyShare&lt;/code&gt;, &lt;code&gt;ForShare&lt;/code&gt;, &lt;code&gt;ForNoKeyUpdate&lt;/code&gt;, &lt;code&gt;ForUpdate&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;MultiXact Infomask Flags
 &lt;div id="multixact-infomask-flags" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-infomask-flags" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL marks row locks on xmax and records them in infomask.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_KEYSHR_LOCK	0x0010	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is a key-shared locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_EXCL_LOCK		0x0040	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is exclusive locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_LOCK_ONLY		0x0080	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax, if valid, is only a locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_SHR_LOCK	(HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_LOCK_MASK	(HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_IS_MULTI		0x1000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_xmax is a MultiXactId */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here we focus on the &lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt; flag. Only when multiple transactions hold shared locks on the same row is a true MultiXact ID generated and this flag set.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- initially one row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+----------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;742&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Session 1&lt;/th&gt;
 &lt;th&gt;Session 2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;lzldb=# begin; &lt;br /&gt; BEGIN &lt;br /&gt;lzldb=*# select * from lzl1 for share; &lt;br /&gt;a &lt;br /&gt;&amp;mdash; &lt;br /&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;lzldb=# begin; &lt;br /&gt; BEGIN &lt;br /&gt;lzldb=*# select * from lzl1 for share;&lt;br /&gt;a &lt;br /&gt;&amp;mdash; &lt;br /&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;lzldb=*# update lzl1 set a=2; &amp;ndash;hang&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;commit；&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UPDATE 1 &amp;ndash;update completed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check tuple xmax and infomask
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,t_xmin,t_xmax,(t_infomask&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;)&lt;span style="color:#f92672"&gt;!=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; is_multixact &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; is_multixact 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+--------+--------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;742&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;744&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt; is &lt;code&gt;0x1000&lt;/code&gt; in hex, which is 4096 in decimal. Using &lt;code&gt;(t_infomask&amp;amp;4096)!=0 is_multixact&lt;/code&gt; shows whether the tuple uses a MultiXact ID. From the example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MultiXact IDs have their own value space, separate from transaction IDs.&lt;/li&gt;
&lt;li&gt;MultiXact IDs are generally smaller than transaction IDs — here t_xmax &amp;lt; t_xmin.&lt;/li&gt;
&lt;li&gt;For an UPDATE, old and new tuples typically share the same xmax. In MultiXact scenarios, they may differ.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;MultiXact SLRU
 &lt;div id="multixact-slru" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-slru" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although &lt;code&gt;src/backend/access/transam/multixact.c&lt;/code&gt; defines many variables and functions at the top — &lt;code&gt;page&lt;/code&gt;, &lt;code&gt;member&lt;/code&gt;, &lt;code&gt;membergroup&lt;/code&gt;, &lt;code&gt;offset&lt;/code&gt; — they are all about defining variable values and conversion functions between them.&lt;/p&gt;
&lt;p&gt;Before reading &lt;code&gt;multixact.c&lt;/code&gt;, understand a few macros:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/include/c.h&lt;/code&gt; defines &lt;code&gt;MultiXactOffset&lt;/code&gt; as a 32-bit type:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; uint32 MultiXactOffset;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;src/include/access/slru.h&lt;/code&gt; defines how many SLRU pages per segment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SLRU_PAGES_PER_SEGMENT	32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Back to the top of &lt;code&gt;src/backend/access/transam/multixact.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;define &lt;span style="color:#a6e22e"&gt;MULTIXACT_OFFSETS_PER_PAGE&lt;/span&gt; (BLCKSZ &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(MultiXactOffset)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// MULTIXACT_OFFSETS_PER_PAGE = 8k / 32B = 2048. One page stores 2048 offset markers, i.e., 2048 MultiXactIds.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetPage(xid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((xid) / (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the page where the corresponding record resides: xid / 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetEntry(xid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((xid) % (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the offset within the page: xid % 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetSegment(xid) (MultiXactIdToOffsetPage(xid) / SLRU_PAGES_PER_SEGMENT)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the segment: xid / 2048 / 32
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now read the comments at the top of the source:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * used everywhere else in Postgres.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * MultiXact page numbering also wraps around at
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * take no explicit notice of that fact in this module, except when comparing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * segment and page numbers in TruncateMultiXact (see
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * MultiXactOffsetPagePrecedes).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since &lt;code&gt;MultiXactOffsets&lt;/code&gt; are 32-bit and subject to wraparound:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MultiXact page numbering wraps at &lt;code&gt;0xFFFFFFFF / MULTIXACT_OFFSETS_PER_PAGE = 2^32 / 2048 = 2^21&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Segment numbering wraps at &lt;code&gt;0xFFFFFFFF / MULTIXACT_OFFSETS_PER_PAGE / SLRU_PAGES_PER_SEGMENT = 2^32 / 2^11 / 2^5 = 2^16&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;TruncateMultiXact()&lt;/code&gt; cleans up these segments and page numbers. It is called by VACUUM.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The pg_multixact Directory
 &lt;div id="the-pg_multixact-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_multixact-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Like CLOG and SUBTRANS, MultiXact logs use an SLRU buffer pool implementation. The &lt;code&gt;pg_multixact&lt;/code&gt; directory has only two subdirectories: &lt;code&gt;members&lt;/code&gt; and &lt;code&gt;offsets&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg_multixact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drwx------ &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; 21:29 members
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drwx------ &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; 21:29 offsets&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One MultiXactId corresponds to multiple TransactionIds — the members. The offset is the starting position of each MultiXact.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/39f86a3494b8.png" alt="image" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; mXactCacheEnt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactId multi; &lt;span style="color:#75715e"&gt;// one MultiXactId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;		nmembers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	dlist_node	node;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactMember members[FLEXIBLE_ARRAY_MEMBER]; &lt;span style="color:#75715e"&gt;// multiple TransactionIds; expanded via MultiXactIdExpand() if needed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} mXactCacheEnt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;multixact.h&lt;/code&gt; defines &lt;code&gt;MultiXactMember&lt;/code&gt; as just a single transaction ID and its status:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; MultiXactMember
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactStatus status;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} MultiXactMember;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;MultiXact References
 &lt;div id="multixact-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pgpedia.info/m/multixact-id.html" target="_blank" rel="noreferrer"&gt;https://pgpedia.info/m/multixact-id.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/15/explicit-locking.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/15/explicit-locking.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/14939" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/14939&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/06/12/transactions-in-postgresql-and-their-mechanism/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/06/12/transactions-in-postgresql-and-their-mechanism/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Two-Phase Commit (2PC) Transactions
 &lt;div id="two-phase-commit-2pc-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#two-phase-commit-2pc-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is a 2PC Transaction?
 &lt;div id="what-is-a-2pc-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-2pc-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction atomicity requires that a transaction either completes entirely or rolls back entirely. In distributed transactions spanning multiple connected databases, a consistent state must be provided to satisfy distributed transaction atomicity. Like other databases, PostgreSQL provides the Two-Phase Commit Protocol (2PC).&lt;/p&gt;
&lt;p&gt;There are many distributed transaction implementations; 2PC is the most fundamental and common. Distributed transactions encompass atomic commit, atomic visibility, and global consistency. 2PC is only an implementation for atomic commit.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PREPARE TRANSACTION
 &lt;div id="prepare-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prepare-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Foreign Data Wrappers (FDWs) can handle 2PC internally. PostgreSQL also provides an explicit way to use 2PC: &lt;code&gt;PREPARE TRANSACTION&lt;/code&gt;. Once issued, the prepared transaction is detached from the session; its state is persisted. &lt;code&gt;PREPARE TRANSACTION&lt;/code&gt; is not designed for use in applications or interactive sessions — unless you&amp;rsquo;re writing a transaction manager — so it is recommended (and default) to keep it disabled.&lt;/p&gt;
&lt;p&gt;Syntax:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; transaction_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; PREPARED transaction_id 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED transaction_id&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;transaction_id&lt;/code&gt; here is not the internal transaction ID — it&amp;rsquo;s just a user-declared string.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PREPARE TRANSACTION&lt;/code&gt; must be inside a transaction block, started with &lt;code&gt;BEGIN&lt;/code&gt; or &lt;code&gt;START TRANSACTION&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_prepared_transactions&lt;/code&gt; controls the number of prepared transactions. Default is 0 (disabled). Must be increased to use prepared transactions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Starting a Prepared Transaction
 &lt;div id="starting-a-prepared-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#starting-a-prepared-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+-------------------------------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;719&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;866022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; prepared &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+----------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;The pg_twophase Directory
 &lt;div id="the-pg_twophase-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_twophase-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As mentioned, prepared transactions are session-independent. When a prepared transaction is started, its state information is stored in a cache. To ensure the transaction is not lost, prepared transactions are also persisted to the &lt;code&gt;pg_twophase&lt;/code&gt; directory. This doesn&amp;rsquo;t only happen on shutdown — it&amp;rsquo;s tied to &lt;code&gt;checkpoint&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/twophase.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckPointTwoPhase&lt;/span&gt;(XLogRecPtr redo_horizon)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// checkpoint start
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(TWOPHASE_DIR, true); &lt;span style="color:#75715e"&gt;// call fsync to flush to disk
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// checkpoint done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s test: start a prepared transaction and run a checkpoint:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl pg_twophase]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECKPOINT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl pg_twophase]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 pg pg 116 Apr 29 16:33 000002D0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Orphaned Prepared Transactions
 &lt;div id="orphaned-prepared-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#orphaned-prepared-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If a prepared transaction is never completed (neither committed nor rolled back), and since it is session-independent, it will persist unless explicitly terminated. (Normally, a regular transaction rolls back when the session disconnects.) This is an &lt;strong&gt;orphaned prepared transaction&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Orphaned prepared transactions hold locks and tuple resources indefinitely, preventing VACUUM from reclaiming dead tuples and even blocking transaction ID wraparound. For example, if a prepared transaction is forgotten and not committed or rolled back, and there is no external transaction management monitoring it, it may go unnoticed and exist forever — ultimately causing severe problems. Therefore, it&amp;rsquo;s recommended to keep &lt;code&gt;max_prepared_transactions=0&lt;/code&gt; (default) or monitor prepared transactions via the &lt;code&gt;pg_prepared_xacts&lt;/code&gt; view.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a simulation of an orphaned prepared transaction causing indefinite blocking:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Start a prepared transaction and disconnect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After disconnecting, the prepared transaction still exists
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+-------------------------------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;721&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;597678&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DDL blocked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b int;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,relation,pid,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+----------+-------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26136&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- End the prepared transaction; DDL completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; prepared &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b int;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;2PC Transaction References
 &lt;div id="2pc-transaction-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2pc-transaction-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://postgres.cn/docs/13/sql-prepare-transaction.html" target="_blank" rel="noreferrer"&gt;http://postgres.cn/docs/13/sql-prepare-transaction.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/01/28/understanding-prepared-transactions-and-handling-the-orphans/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/01/28/understanding-prepared-transactions-and-handling-the-orphans/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Subtransactions
 &lt;div id="subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is a Subtransaction?
 &lt;div id="what-is-a-subtransaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-subtransaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A regular transaction can only commit or roll back as a whole. Subtransactions allow partial rollback.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SAVEPOINT p1&lt;/code&gt; places a savepoint marker inside a transaction. You cannot directly commit a subtransaction — subtransactions are committed when the parent transaction commits. However, you can use &lt;code&gt;ROLLBACK TO SAVEPOINT p1&lt;/code&gt; to roll back to that savepoint.&lt;/p&gt;
&lt;p&gt;Subtransactions are useful for bulk data loading. If a transaction contains multiple subtransactions and one small segment fails, only that segment needs to be retried — not the entire transaction.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Using Subtransactions in SQL
 &lt;div id="using-subtransactions-in-sql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-subtransactions-in-sql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT savepoint_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; [ &lt;span style="color:#66d9ef"&gt;WORK&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; ] &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; [ SAVEPOINT ] savepoint_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RELEASE [ SAVEPOINT ] savepoint_name&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Savepoint statements must be inside a transaction block.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SAVEPOINT&lt;/code&gt; creates a savepoint; &lt;code&gt;ROLLBACK TO&lt;/code&gt; rolls back to the named savepoint; &lt;code&gt;RELEASE&lt;/code&gt; erases the savepoint without rolling back subtransaction data.&lt;/li&gt;
&lt;li&gt;Cursors are not affected by savepoint operations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p3;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; savepoint p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;731&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Rolling back to p2 also rolled back p3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;731&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;733&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;734&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subtransaction infomask is not very different from regular transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Multiple commands within the same transaction are differentiated by cid and HEAP_XMIN_INVALID, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subtransaction writes also consume transaction IDs, and cid increments within the parent transaction framework.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Other Sources of Subtransactions
 &lt;div id="other-sources-of-subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-sources-of-subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Even without explicit &lt;code&gt;SAVEPOINT&lt;/code&gt;, subtransactions can be created by other means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;EXCEPTION&lt;/code&gt; blocks trigger subtransactions. This is common in tools and frameworks and easily overlooked. Every &lt;code&gt;EXCEPTION&lt;/code&gt; creates a subtransaction.&lt;/p&gt;
&lt;p&gt;Syntax: &lt;code&gt;BEGIN / EXCEPTION WHEN .. / END&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html" target="_blank" rel="noreferrer"&gt;https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PL/Python code using &lt;code&gt;plpy.subtransaction()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Subtransaction SLRU Cache
 &lt;div id="subtransaction-slru-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-slru-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Subtransaction commit logs are in &lt;code&gt;pg_xact&lt;/code&gt;. Parent-child relationships are stored in &lt;code&gt;pg_subtrans&lt;/code&gt;, which caches the mapping of subXID to parent XID. When PostgreSQL needs to look up a subXID, it calculates which memory page the ID resides on and searches within that page. If the page is not in cache, it evicts a page and loads the required page from &lt;code&gt;pg_subtrans&lt;/code&gt; into memory. Large numbers of subtransaction cache misses consume system I/O and CPU.&lt;/p&gt;
&lt;p&gt;The subtransaction buffer is only 32 pages, hardcoded in the source.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/subtrans.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Number of SLRU buffers to use for subtrans */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\#&lt;/span&gt;define NUM_SUBTRANS_BUFFERS &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Buffer default is 8KB; xid is 32 bits (4 bytes). Therefore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SUBTRANS_BUFFER size: &lt;code&gt;32 * 8K = 256KB&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;SUBTRANS_BUFFER can store at most: &lt;code&gt;32 * 8K / 4 = 65,536&lt;/code&gt; xids&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f1a4a6d13c77.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Finding a subtransaction&amp;rsquo;s position in a page by transaction ID:&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/subtrans.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* We need four bytes per xact */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Each page can store up to 8K / 4 bytes = 2048 subtransaction IDs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Calculate page number from subtransaction xid: xid / 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Calculate offset within page from subtransaction xid: xid % 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Subtransaction xids may not be densely packed within a page — a page may hold fewer than 2048 subtransaction IDs.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Dangers of Subtransactions
 &lt;div id="the-dangers-of-subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-dangers-of-subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;1. PGPROC_MAX_CACHED_SUBXIDS Overflow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt; is not a GUC parameter — it&amp;rsquo;s hardcoded. You can only change it by modifying the source.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*Each backend has a subtransaction cache limit of PGPROC_MAX_CACHED_SUBXIDS.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*We must track whether the cache has overflowed (i.e., the transaction has at least one subtransaction that couldn&amp;#39;t be cached).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*If no cache has overflowed, we can be sure that an xid not in the PGPROC array is definitely not a running transaction.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*If there is an overflow, we must consult pg_subtrans.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;#define PGPROC_MAX_CACHED_SUBXIDS 64	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* XXX guessed-at value */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; XidCache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Two key takeaways from this source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every backend&amp;rsquo;s subtransaction cache is capped at &lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt;: 64 subtransactions.&lt;/li&gt;
&lt;li&gt;Beyond 64 subtransactions, they overflow to the &lt;code&gt;pg_subtrans&lt;/code&gt; directory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An expert&amp;rsquo;s benchmark: performance drops when subtransactions just exceed 64. So it&amp;rsquo;s best to keep per-session subtransactions below 64.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ac0dc4add28.png" alt="image" /&gt;
Reference: &lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Subtransactions Causing MultiXact Contention&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;FOR UPDATE&lt;/code&gt; itself is a row-level exclusive lock and should not generate a MultiXact ID. But in this scenario, multiple MultiXact waits occurred, causing a cliff-like performance drop:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LWLock:MultiXactMemberControlLock&lt;/li&gt;
&lt;li&gt;LWLock:MultiXactOffsetControlLock&lt;/li&gt;
&lt;li&gt;LWLock:multixact_member&lt;/li&gt;
&lt;li&gt;LwLock:multixact_offset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was later discovered that the Django framework was issuing subtransaction statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;some&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SAVEPOINT save;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; [the same &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;];&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Replica Performance Cliff&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A single long transaction with a savepoint subtransaction can also cause a performance cliff on replicas.&lt;/p&gt;
&lt;p&gt;If a read occurs on a snapshot taken on the primary, the snapshot includes xmin, xmax, the txip transaction list, and subxip (the list of in-progress subtransactions). &lt;strong&gt;However, neither the original arrays nor the snapshot are directly shared with replicas — replicas read all needed data from WAL.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a4c7e36c274a.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;When subtransactions exist, a single long-running transaction can cause replica performance to drop off a cliff:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/06211a3788ce.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Production Performance Cliff&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When the database is busy and many subtransactions exist, performance can drop sharply, accompanied by subtransaction wait events. This scenario can occur even when per-session subtransactions don&amp;rsquo;t exceed 64, and even on the primary (not just replicas).&lt;/p&gt;
&lt;p&gt;We found that a tool (OGG) defaulted to 50 subtransactions. Reducing the subtransaction count in that tool to 10–20 alleviated the database performance issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction usage recommendations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Besides explicit &lt;code&gt;SAVEPOINT&lt;/code&gt;, EXCEPTION blocks, frameworks, and tools can also generate subtransactions.&lt;/li&gt;
&lt;li&gt;If you have replica query workloads, &lt;strong&gt;disable subtransactions&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use row locks cautiously. &lt;code&gt;FOR UPDATE&lt;/code&gt; + subtransactions can also trigger MultiXactId issues.&lt;/li&gt;
&lt;li&gt;If you must use subtransactions, keep them well below 64 per session — preferably much lower.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Subtransactions have caused countless production issues worldwide, with many case studies and analyses. To quote: &amp;ldquo;Subtransactions are basically cursed. Rip &amp;rsquo;em out.&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Subtransaction References
 &lt;div id="subtransaction-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html" target="_blank" rel="noreferrer"&gt;https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/postgrechina/article/details/49130743?spm=a2c6h.12873639.article-detail.7.41b32cda2KR1QM" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/postgrechina/article/details/49130743?spm=a2c6h.12873639.article-detail.7.41b32cda2KR1QM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://mysql.taobao.org/monthly/2018/12/02/" target="_blank" rel="noreferrer"&gt;http://mysql.taobao.org/monthly/2018/12/02/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Localization</title><link>https://lastdba.com/en/2024/08/12/postgresql-localization/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-localization/</guid><description>&lt;h2 class="relative group"&gt;Localization Concepts
 &lt;div id="localization-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#localization-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The purpose of localization is to support the language features and rules of different countries and regions. With localization support, you can use character sets that handle Chinese, French, Japanese, and more. Beyond character sets, there are also character sorting rules and other language-related rule support. For example, we know how to sort (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;b&amp;rsquo;), but how should (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;A&amp;rsquo;) and (&amp;lsquo;啊&amp;rsquo;, &amp;lsquo;阿&amp;rsquo;) be sorted?&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Localization Concepts
 &lt;div id="localization-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#localization-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The purpose of localization is to support the language features and rules of different countries and regions. With localization support, you can use character sets that handle Chinese, French, Japanese, and more. Beyond character sets, there are also character sorting rules and other language-related rule support. For example, we know how to sort (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;b&amp;rsquo;), but how should (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;A&amp;rsquo;) and (&amp;lsquo;啊&amp;rsquo;, &amp;lsquo;阿&amp;rsquo;) be sorted?&lt;/p&gt;
&lt;p&gt;If you search Google for information about localization, character sets, and collation, you might end up with knowledge that feels both complex and distant. The best teacher is still 


&lt;img src="https://lastdba.com/img/csdn/4a8579e2070f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Localization knowledge is divided into three parts: locale support, collation, and character sets.&lt;/p&gt;

&lt;h2 class="relative group"&gt;locale
 &lt;div id="locale" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locale" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL&amp;rsquo;s localization is provided by the operating system. You need to check whether the OS supports it via &lt;code&gt;locale -a&lt;/code&gt;. The locale can be specified when initializing the database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can also set localization subcategories individually: string sort order, character classification, numeric formatting, date formatting, time formatting, currency formatting, etc.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN --lc-monetary&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;All localization subcategories:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Subcategory&lt;/th&gt;
 &lt;th&gt;Rule&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_COLLATE&lt;/td&gt;
 &lt;td&gt;String sort order&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_CTYPE&lt;/td&gt;
 &lt;td&gt;Character classification (What is a letter? Its upper-case equivalent?)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_MESSAGES&lt;/td&gt;
 &lt;td&gt;Language of messages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_MONETARY&lt;/td&gt;
 &lt;td&gt;Formatting of currency amounts&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_NUMERIC&lt;/td&gt;
 &lt;td&gt;Formatting of numbers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_TIME&lt;/td&gt;
 &lt;td&gt;Formatting of dates and times&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These subcategories can be split into two groups. &lt;code&gt;lc_messages&lt;/code&gt;, &lt;code&gt;lc_monetary&lt;/code&gt;, &lt;code&gt;lc_numeric&lt;/code&gt;, and &lt;code&gt;lc_time&lt;/code&gt; can be adjusted via parameters after initialization. &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; belong to collation — see the collation section for adjustment details.&lt;/p&gt;
&lt;p&gt;Locale settings affect the following behaviors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sort order in queries using &lt;code&gt;ORDER BY&lt;/code&gt; or the standard comparison operators on textual data&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;lower&lt;/code&gt;, and &lt;code&gt;initcap&lt;/code&gt; functions&lt;/li&gt;
&lt;li&gt;Pattern matching operators (&lt;code&gt;LIKE&lt;/code&gt;, &lt;code&gt;SIMILAR TO&lt;/code&gt;, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressions&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;to_char&lt;/code&gt; family of functions&lt;/li&gt;
&lt;li&gt;The ability to use indexes with &lt;code&gt;LIKE&lt;/code&gt; clauses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;COLLATION
 &lt;div id="collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Collation defines the sort order of characters and character classification behavior. Some database operators depend on collation, such as &lt;code&gt;ORDER BY&lt;/code&gt;, &lt;code&gt;lower&lt;/code&gt;, &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt;, &lt;code&gt;to_char&lt;/code&gt;, and others.&lt;/p&gt;
&lt;p&gt;Use the following SQL to query the system table &lt;code&gt;pg_collation&lt;/code&gt; to get &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; information for supported character sets:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;default&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;C&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;POSIX&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;en_US.utf8&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_SG.gb2312&amp;#39;&lt;/span&gt;) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; is the character set, and &lt;code&gt;collname&lt;/code&gt; is the collation name.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When &lt;code&gt;encoding&lt;/code&gt; is empty, it means this collation supports all character sets.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;default&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt;, &lt;code&gt;POSIX&lt;/code&gt; are collations supported on all platforms, provided by &lt;code&gt;libc&lt;/code&gt;. Other collations depend on whether the operating system supports them (&lt;code&gt;locale -a&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;default&lt;/code&gt; means using the collation set at database creation time, which can be viewed via &lt;code&gt;\l&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt; is semantically equivalent to &lt;code&gt;POSIX&lt;/code&gt;, but PostgreSQL still considers them different collations. They both compare characters by ASCII code, strictly by byte order.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P21: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; mismatch &lt;span style="color:#66d9ef"&gt;between&lt;/span&gt; explicit collations &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: merge_collation_state, parse_collate.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;834&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;UTF8 is the most common character set, and the most common language environments are &lt;code&gt;en_US&lt;/code&gt; and &lt;code&gt;zh_CN&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;You can create custom collations via &lt;code&gt;CREATE COLLATION ...&lt;/code&gt;. However, cases where &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; differ are very rare.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;LC_COLLATE
 &lt;div id="lc_collate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lc_collate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LC_COLLATE&lt;/code&gt; affects character comparison (sorting, character operations, etc.).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;COLLATE&lt;/code&gt; clause can transform the collation of an expression:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;expr &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that this specifies a &lt;em&gt;collation&lt;/em&gt;, not &lt;code&gt;lc_collate&lt;/code&gt;. If no collation is explicitly specified, the database uses the column&amp;rsquo;s collation by default. If the column has no collation specified, it uses the database&amp;rsquo;s default collation.&lt;/p&gt;
&lt;p&gt;Sorting test with different collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These three different collations have different &lt;code&gt;lc_collate&lt;/code&gt; values, and the sort methods are indeed different — we can see three distinct sort results from the output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why does collation C put &amp;lsquo;A&amp;rsquo; before &amp;lsquo;a&amp;rsquo;?&lt;/strong&gt;
Collation C uses ASCII encoding order. In ASCII, uppercase letters come before lowercase. Meanwhile, &lt;code&gt;en_US.utf8&lt;/code&gt; and &lt;code&gt;zh_CN.utf8&lt;/code&gt; clearly do not follow this order for English letters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Order of Chinese characters&lt;/strong&gt;
Even with the same UTF8 character set, the order of Chinese characters differs between Chinese and English locales. Different &lt;code&gt;lc_collate&lt;/code&gt; values correspond to different alphabets for different localized languages. The sort order with &lt;code&gt;lc_collate=C&lt;/code&gt; is always by byte order. Although ASCII does not include Chinese, C can still sort Chinese — (essentially) every Chinese character maps to a UTF8 encoding, and C sorts by byte order.&lt;/p&gt;

&lt;h3 class="relative group"&gt;LC_CTYPE
 &lt;div id="lc_ctype" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lc_ctype" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LC_CTYPE&lt;/code&gt; affects character operations (such as &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;If the string is all English, e.g., &lt;code&gt;'abcD'&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt; converts it to &lt;code&gt;'Abcd'&lt;/code&gt; under all three collations — nothing special to show here.&lt;/p&gt;
&lt;p&gt;But when Chinese is introduced, the results differ:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿bBBb&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;Aaaa阿Bbbb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿aAAa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;aaaa阿aaaa
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿aAAa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;aaaa阿aaaa&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;LC_CTYPE=C&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt; capitalizes the first letter of every non-contiguous English character sequence, whereas &lt;code&gt;en_US.utf8&lt;/code&gt; and &lt;code&gt;zh_CN.utf8&lt;/code&gt; only capitalize the very first character (Chinese characters remain unchanged) and lowercase other English characters.&lt;/p&gt;
&lt;p&gt;The behavior of &lt;code&gt;initcap&lt;/code&gt; with Chinese may be an undefined requirement, but we can conclude: &lt;strong&gt;different &lt;code&gt;LC_CTYPE&lt;/code&gt; settings lead to different results from character-sensitive functions like &lt;code&gt;initcap&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Furthermore, Chinese is case-insensitive, but some other localized languages do have case distinctions — different &lt;code&gt;LC_CTYPE&lt;/code&gt; settings lead to even more complex outcomes.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Character Sets
 &lt;div id="character-sets" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-sets" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Character Set Basics
 &lt;div id="character-set-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-set-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL supports different character sets (also called encodings). Character sets and collation are two separate concepts, but the character set must be compatible with &lt;code&gt;LC_CTYPE&lt;/code&gt; and &lt;code&gt;LC_COLLATE&lt;/code&gt;. As seen in &lt;code&gt;pg_collation&lt;/code&gt;, C/POSIX support all character sets, while other collations only support one character set (on Linux systems).&lt;/p&gt;
&lt;p&gt;Chinese-related character sets available in PostgreSQL:
*(&lt;em&gt;The C collation is provided by the libc library; some collations can be provided by the ICU library, requiring compilation in advance.)&lt;/em&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Language&lt;/th&gt;
 &lt;th&gt;Server-side support?&lt;/th&gt;
 &lt;th&gt;ICU support?&lt;/th&gt;
 &lt;th&gt;Bytes/Char&lt;/th&gt;
 &lt;th&gt;Aliases&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;Big Five&lt;/td&gt;
 &lt;td&gt;Traditional Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–2&lt;/td&gt;
 &lt;td&gt;WIN950, Windows950&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;EUC_CN&lt;/td&gt;
 &lt;td&gt;Extended UNIX Code-CN&lt;/td&gt;
 &lt;td&gt;Simplified Chinese&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;1–3&lt;/td&gt;
 &lt;td&gt;GB2312&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;National Standard&lt;/td&gt;
 &lt;td&gt;Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–4&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;Extended National Standard&lt;/td&gt;
 &lt;td&gt;Simplified Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–2&lt;/td&gt;
 &lt;td&gt;WIN936, Windows936&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;Unicode, 8-bit&lt;/td&gt;
 &lt;td&gt;all&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;1–4&lt;/td&gt;
 &lt;td&gt;Unicode&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Traditional Chinese&lt;/strong&gt;:
&lt;a href="https://baike.baidu.com/item/%E5%A4%A7%E4%BA%94%E7%A0%81/2413431?fr=ge_ala" target="_blank" rel="noreferrer"&gt;BIG5&lt;/a&gt; is the most common character set standard for Traditional Chinese. It was once the industry standard and was later incorporated as a national standard.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simplified Chinese&lt;/strong&gt;:
GB stands for &amp;ldquo;Guobiao&amp;rdquo; (national standard). GB2312, GB18030, and GBK are all Chinese national character set standards. Due to issues such as rare characters and years of development producing several historical versions, there appear to be multiple standards.
&lt;a href="https://baike.baidu.com/item/EUC-CN/4514294?fr=ge_ala" target="_blank" rel="noreferrer"&gt;EUC_CN&lt;/a&gt; stands for Extended UNIX Code-CN, which is essentially &lt;a href="https://baike.baidu.com/item/%E4%BF%A1%E6%81%AF%E4%BA%A4%E6%8D%A2%E7%94%A8%E6%B1%89%E5%AD%97%E7%BC%96%E7%A0%81%E5%AD%97%E7%AC%A6%E9%9B%86/8074272?fromModule=lemma_inlink&amp;amp;fromtitle=GB2312&amp;amp;fromid=483170" target="_blank" rel="noreferrer"&gt;GB2312&lt;/a&gt;, but it cannot handle all rare characters either. Similarly named encodings include EUC_KR, EUC_JP, EUC_TW, and so on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;International Standards&lt;/strong&gt;:
The character sets above are all national standards — they support English and Chinese but not other languages. The international standard that supports all languages of the world is &lt;a href="https://home.unicode.org/" target="_blank" rel="noreferrer"&gt;Unicode&lt;/a&gt; (which even includes emoji &amp;#x1f44d;). (There is also the well-known international standards organization ISO, which maintains character sets as well — there is some overlap, but we&amp;rsquo;ll set ISO aside for now.)&lt;/p&gt;
&lt;p&gt;Due to different Unicode encoding schemes, there are three encoding formats: UTF-8, UTF-16, and UTF-32.&lt;/p&gt;
&lt;p&gt;UTF-8 encoding format:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Bytes&lt;/th&gt;
 &lt;th&gt;Format&lt;/th&gt;
 &lt;th&gt;Actual encoding bits&lt;/th&gt;
 &lt;th&gt;Code point range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1 byte&lt;/td&gt;
 &lt;td&gt;0xxxxxxx&lt;/td&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;0 ~ 127&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2 byte&lt;/td&gt;
 &lt;td&gt;110xxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;128 ~ 2047&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3 byte&lt;/td&gt;
 &lt;td&gt;1110xxxx 10xxxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;16&lt;/td&gt;
 &lt;td&gt;2048 ~ 65535&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4 byte&lt;/td&gt;
 &lt;td&gt;11110xxx 10xxxxxx 10xxxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;21&lt;/td&gt;
 &lt;td&gt;65536 ~ 2097151&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;UTF8 encoding is variable-length.
For characters in the range 0x00-0x7F (1 byte), UTF-8 encoding is exactly identical to ASCII (American Standard Code for Information Interchange). Therefore, UTF-8 is fully backward-compatible with ASCII.&lt;/p&gt;
&lt;p&gt;Due to shared origins, meanings, and similarities, Chinese, Japanese, Korean, and Vietnamese characters use a unified encoding in Unicode called &lt;a href="https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink" target="_blank" rel="noreferrer"&gt;CJK Unified Ideographs (CJKV Unified Ideographs)&lt;/a&gt;.
CJK Unified Ideographs encoding ranges: 3400-4DBF/4E00-9FFF/20000-3FFFF.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f914ea2ca52f.png" alt="" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Character Set Conversion
 &lt;div id="character-set-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-set-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;server_encoding&lt;/code&gt; and &lt;code&gt;client_encoding&lt;/code&gt; differ, automatic conversion of the character set returned by the server can occur. For setting server-side and client-side character sets, see the &amp;ldquo;Configuring Character Sets&amp;rdquo; section.&lt;/p&gt;
&lt;p&gt;Chinese-related character sets — Server/Client convertible table:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Server Character Set&lt;/th&gt;
 &lt;th&gt;Available Client Character Sets&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN (GB2312)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN (GB2312), &lt;code&gt;MULE_INTERNAL&lt;/code&gt;, &lt;code&gt;UTF8&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;all supported encodings&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030 and GBK are not supported on the server side, so in practice only EUC_CN (GB2312) and UTF8 can perform Server/Client conversion.&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The above lists the character sets that &lt;em&gt;can&lt;/em&gt; be converted, but conversion still requires CONVERSION support. PostgreSQL has built-in conversion functions visible via &lt;code&gt;pg_conversion&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Conversion Name&lt;/th&gt;
 &lt;th&gt;Source Encoding&lt;/th&gt;
 &lt;th&gt;Destination Encoding&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;big5_to_utf8&lt;/td&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;euc_cn_to_utf8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gb18030_to_utf8&lt;/td&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gbk_to_utf8&lt;/td&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_big5&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;utf8_to_euc_cn&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_gb18030&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_gbk&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You can create custom conversions via the &lt;code&gt;CREATE CONVERSION&lt;/code&gt; statement, specifying the conversion function.&lt;/p&gt;
&lt;p&gt;Some character sets appear to be interconvertible, but the server side doesn&amp;rsquo;t support storing them at all (such as BIG5, GB18030, GBK), so it&amp;rsquo;s not practically useful. All we need to know here is that &lt;code&gt;euc_cn&lt;/code&gt; and &lt;code&gt;utf8&lt;/code&gt; can be converted to/from each other.&lt;/p&gt;
&lt;p&gt;Without CONVERSION support, conversion cannot happen:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- EUC_CN database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; EUC_KR
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EUC_KR: invalid &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;conversion&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;procedure&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;found&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Character set conversion test&lt;/strong&gt;:
&lt;em&gt;Pay attention to the client-side character set settings (e.g., CRT&amp;rsquo;s &amp;ldquo;session&amp;rdquo; - &amp;ldquo;Appearance&amp;rdquo; - &amp;ldquo;Character encoding&amp;rdquo;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There are at least three endpoints with character set concepts: database server, database client, and UI client. CONVERSION only controls: database server → database client.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Server with UTF8 conversion test:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; zh(col1 varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;gt;&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- 〇 (líng) is a Chinese character
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- If CRT is not set to UTF8, Chinese characters are all garbled; only set CRT to UTF8 for insertion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; server_encoding;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; client_encoding;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; client_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With no conversion at all, UTF8 displays correctly. Currently three endpoints: UTF8 - UTF8 - UTF8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;〇&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Switch database client character set. Now three endpoints: UTF8 - EUC_CN - UTF8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; EUC_CN; &lt;span style="color:#75715e"&gt;-- Set client character set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe9 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x98
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_invalid_encoding, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1597&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- It looks like &amp;#34;阿&amp;#34; and &amp;#34;〇&amp;#34; cannot be converted to EUC_CN, but that&amp;#39;s not the whole story
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;B0&lt;span style="color:#f92672"&gt;&amp;gt;&amp;lt;&lt;/span&gt;A2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The second row is &amp;#34;阿&amp;#34;. The database server/client appears to have converted the character set from UTF8 to EUC_CN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- However, it may not display correctly due to UI client issues (currently CRT is set to UTF8)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Even changing CRT to GB2312 still won&amp;#39;t display correctly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;B0&lt;span style="color:#f92672"&gt;&amp;gt;&amp;lt;&lt;/span&gt;A2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When querying 〇, the database throws an error directly, indicating 〇 cannot be converted from UTF8 to EUC_CN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;P05: character &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; byte sequence &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x87 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; equivalent &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_untranslatable_char, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1631&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Server with EUC_CN conversion test:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; server_encoding; &lt;span style="color:#75715e"&gt;-- Database has EUC_CN character set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the same zh table under the EUC_CN database, but inserting already has issues
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;P05: character &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; byte sequence &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x87 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; equivalent &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_untranslatable_char, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1631&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Again, the error says 〇 cannot be converted from UTF8 to EUC_CN. EUC_CN (GB2312) Chinese encoding is not fully identical to UTF8 — EUC_CN (GB2312) does not include all Chinese characters, especially rare ones.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Configuring locale, collation, and character set
 &lt;div id="configuring-locale-collation-and-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#configuring-locale-collation-and-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve covered localization and character sets, here&amp;rsquo;s a summary.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Database cluster locale, collation, character set
 &lt;div id="database-cluster-locale-collation-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-cluster-locale-collation-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;At initialization time, you can set the database cluster&amp;rsquo;s locale and character set:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc_collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc_ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc_collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc_ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc-messages&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-monetary&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-numeric&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;initdb&lt;/code&gt; creates three databases: &lt;code&gt;postgres&lt;/code&gt;, &lt;code&gt;template1&lt;/code&gt;, and &lt;code&gt;template0&lt;/code&gt;. The &lt;code&gt;CREATE DATABASE&lt;/code&gt; statement defaults to using &lt;code&gt;template1&lt;/code&gt; to create databases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; sets the character set; &lt;code&gt;locale&lt;/code&gt; sets &lt;code&gt;LC_COLLATE&lt;/code&gt;, &lt;code&gt;LC_CTYPE&lt;/code&gt;, &lt;code&gt;LC_MESSAGES&lt;/code&gt;, &lt;code&gt;LC_MONETARY&lt;/code&gt;, &lt;code&gt;LC_NUMERIC&lt;/code&gt;, and &lt;code&gt;LC_TIME&lt;/code&gt;, unless specifically overridden (e.g., via &lt;code&gt;--lc_collate&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; are called collation and can also be set at the database, column, and index levels. &lt;code&gt;LC_MESSAGES&lt;/code&gt;, &lt;code&gt;LC_MONETARY&lt;/code&gt;, &lt;code&gt;LC_NUMERIC&lt;/code&gt;, and &lt;code&gt;LC_TIME&lt;/code&gt; are instance parameters that can be changed at any time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; can only be set at initialization or at database creation — once set, it cannot be changed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Database collation and character set
 &lt;div id="database-collation-and-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-collation-and-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When creating a database, you can set the database&amp;rsquo;s character set, &lt;code&gt;lc_collate&lt;/code&gt;, and &lt;code&gt;lc_ctype&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Both &lt;code&gt;CREATE DATABASE&lt;/code&gt; and &lt;code&gt;createdb&lt;/code&gt; can specify the character set at database creation time. Once created, the database character set cannot be changed. Both commands use a template database to create the new database.&lt;/p&gt;
&lt;p&gt;There are two templates: &lt;code&gt;template0&lt;/code&gt; and &lt;code&gt;template1&lt;/code&gt;. The official documentation states:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Another common reason for copying &lt;code&gt;template0&lt;/code&gt; instead of &lt;code&gt;template1&lt;/code&gt; is that new encoding and locale settings can be specified when copying &lt;code&gt;template0&lt;/code&gt;, whereas a copy of &lt;code&gt;template1&lt;/code&gt; must use the same settings it does. This is because &lt;code&gt;template1&lt;/code&gt; might contain encoding-specific or locale-specific data, while &lt;code&gt;template0&lt;/code&gt; is known not to.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;code&gt;template1&lt;/code&gt; is a writable template database that may contain localized data, while &lt;code&gt;template0&lt;/code&gt; cannot be written to. Therefore, to create a database with different localization settings, you should use &lt;code&gt;template0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;And you must explicitly use &lt;code&gt;template0&lt;/code&gt;, because the default is &lt;code&gt;template1&lt;/code&gt;. Attempting to create a database without specifying &lt;code&gt;template1&lt;/code&gt; and with a different character set will result in an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22023&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; (EUC_CN) &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; incompatible &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; (UTF8)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Use the same &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; use template0 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, you cannot set the character set by specifying &lt;code&gt;locale&lt;/code&gt; when creating a database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 locale &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22023&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;match&lt;/span&gt; locale &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.gb2312&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The chosen LC_CTYPE setting requires &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: check_encoding_locale_matches, dbcommands.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;773&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The error indicates you need to specify the &lt;code&gt;LC_CTYPE&lt;/code&gt; sub-option. Adding all collation-related sub-options still produces an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 LOCALE &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42601&lt;/span&gt;: conflicting &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; redundant &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: LOCALE cannot be specified together &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; LC_COLLATE &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; LC_CTYPE.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;LOCALE&lt;/code&gt; cannot be used together with &lt;code&gt;LC_CTYPE&lt;/code&gt; and other sub-options.&lt;/p&gt;
&lt;p&gt;Removing &lt;code&gt;locale&lt;/code&gt; and setting via character set, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt; works successfully.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct way to create a database with a specific character set&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CREATE DATABASE&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;createdb&lt;/code&gt;:
Use the CLI command &lt;code&gt;createdb&lt;/code&gt;, which wraps &lt;code&gt;CREATE DATABASE&lt;/code&gt; — they are equivalent:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createdb -E EUC_CN -T template0 --lc-collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN.gb2312 --lc-ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN.gb2312 db_GB2312&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Viewing database character set:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;\l&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;pg_database&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; datname,pg_encoding_to_char(&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;),datcollate,datctype,datlocprovider,daticulocale &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;code&gt;SHOW&lt;/code&gt; parameters&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;SERVER_ENCODING&lt;/code&gt;, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt; are all immutable parameters that display the &lt;em&gt;current&lt;/em&gt; database&amp;rsquo;s server-side character set, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt;, respectively.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Column collation
 &lt;div id="column-collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#column-collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Collation is only related to character sorting and character functions — it is not related to encoding. Without indexes, changing a column&amp;rsquo;s collation is essentially just adjusting the default sort output for that column. With indexes, it will rebuild the index. If no collation is specified for a column, it defaults to the database&amp;rsquo;s collation.&lt;/p&gt;
&lt;p&gt;Specifying collation when creating a table (note: some data types are un-collatable, such as &lt;code&gt;int&lt;/code&gt;):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1(col1 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: &lt;code&gt;ALTER TABLE&lt;/code&gt; without changing the length will not rewrite the table, but it will definitely rebuild the index.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing a column&amp;rsquo;s default collation&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;. &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; t1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;. information_schema.columns
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; table_catalog,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;column_name&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;collation_name&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.columns &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;t1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;. pg_attribute
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a.attrelid::regclass,a.attname,a.attcollation,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collname,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collcollate,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute a &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a.attrelid::regclass&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Method 3 is recommended. While &lt;code&gt;\d+&lt;/code&gt; and &lt;code&gt;information_schema.columns&lt;/code&gt; can show &lt;code&gt;collname&lt;/code&gt;, &lt;code&gt;collname&lt;/code&gt; is not unique. Only method 3 reveals &lt;code&gt;collate&lt;/code&gt; and &lt;code&gt;ctype&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test: specifying collate and viewing &lt;code&gt;pg_attribute&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) ,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Column collation is like tagging the column with a default sort order; you can&amp;#39;t see the specific collate and ctype
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tlzl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+------------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- collname and collate/ctype are not one-to-one; col3&amp;#39;s zh_CN alone doesn&amp;#39;t reveal which collate is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_attribute shows more precisely than \d+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a.attrelid::regclass,a.attname,a.attcollation,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collname,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collcollate,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute a &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a.attrelid::regclass&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; attrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attcollation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+--------------+------------+-------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Now we know that col3 zh_CN&amp;#39;s collate is zh_CN.utf8 &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Test: table rewrite when modifying column collate:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add an index to the column and check rewrite behavior
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxcol4 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(col4);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41006&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41015&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col4 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41006&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Table was not rewritten; index was rewritten&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A column&amp;rsquo;s collation is merely a marker. Modifying the column&amp;rsquo;s collation does not rewrite the table, but if there is an index on it, the index will be rewritten (sometimes not — see the next section).&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index collation
 &lt;div id="index-collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When creating an index, if the index&amp;rsquo;s collation is not explicitly specified, the index uses the collation declared on the column.&lt;/p&gt;
&lt;p&gt;Explicitly specifying collation when creating an index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(col3 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;); &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, indexes can be created with &lt;code&gt;text_pattern_ops&lt;/code&gt;, &lt;code&gt;varchar_pattern_ops&lt;/code&gt;, &lt;code&gt;bpchar_pattern_ops&lt;/code&gt; — in this case, the index does not depend on collation rules but compares character by character:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules.&lt;/p&gt;
&lt;/blockquote&gt;&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; test_index &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; test_table (col varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In fact, this type of index is not entirely unrelated to collation — an index always has a sort order. This type of index&amp;rsquo;s sort order appears to be consistent with &lt;code&gt;C&lt;/code&gt;. See the &amp;ldquo;LIKE not using index&amp;rdquo; section.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing an index&amp;rsquo;s collation:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- \d+ shows indexes with explicitly specified collate; if not specified, the column&amp;#39;s default collation is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; tlzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tlzl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+------------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_c&amp;#34;&lt;/span&gt; btree (col3 &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idxcol4&amp;#34;&lt;/span&gt; btree (col4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Viewing via &lt;code&gt;pg_index&lt;/code&gt; is clearer (the &lt;code&gt;indcollation&lt;/code&gt; type in &lt;code&gt;pg_index&lt;/code&gt; is &lt;code&gt;oidvector&lt;/code&gt; and cannot be directly cast to &lt;code&gt;oid&lt;/code&gt;, making queries a bit cumbersome):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; indcollation,indexrelid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; indexrelid::regclass &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_C&amp;#39;&lt;/span&gt;::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; indcollation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idx_c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----+----------+----------+-------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Also, you cannot change an index&amp;rsquo;s collation via &lt;code&gt;ALTER INDEX&lt;/code&gt; — you must drop and recreate it.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test: After specifying an index collate, does modifying the column&amp;rsquo;s collate rewrite the index?&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid4,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_c&amp;#39;&lt;/span&gt;) IndexRelidC; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelidc 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41020&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41023&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col3 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid4,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_c&amp;#39;&lt;/span&gt;) IndexRelidC; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelidc 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41020&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41023&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41024&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- idx_c&amp;#39;s relfileid did not change&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If an index&amp;rsquo;s collate has been explicitly specified, modifying the column&amp;rsquo;s default collate will not rewrite that index.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Client character set
 &lt;div id="client-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#client-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When the client sets a character set different from the database, character set conversion occurs — though conversion may not always succeed. See the &amp;ldquo;Character Set Conversion&amp;rdquo; section for details.&lt;/p&gt;
&lt;p&gt;The server-side character set cannot be changed after database creation, but the client character set can be adjusted at any time.&lt;/p&gt;
&lt;p&gt;There are many ways to set the client character set:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set directly on the client:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; UTF8 &lt;span style="color:#75715e"&gt;-- psql only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; CLIENT_ENCODING &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; UTF8; &lt;span style="color:#75715e"&gt;-- session-level parameter change
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NAMES&lt;/span&gt; UTF8; &lt;span style="color:#75715e"&gt;-- SQL standard&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Set the &lt;code&gt;PGCLIENTENCODING&lt;/code&gt; environment variable&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;client_encoding&lt;/code&gt; server configuration parameter&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Priority: client-side setting &amp;gt; &lt;code&gt;PGCLIENTENCODING&lt;/code&gt; environment variable &amp;gt; &lt;code&gt;client_encoding&lt;/code&gt; server configuration parameter&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing the client character set:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- psql only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SHOW&lt;/span&gt; client_encoding;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Expression collate
 &lt;div id="expression-collate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#expression-collate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Adding &lt;code&gt;COLLATE&lt;/code&gt; to an expression overrides the expression&amp;rsquo;s original collation, effectively specifying a sort collation.&lt;/p&gt;
&lt;p&gt;Add the &lt;code&gt;COLLATE&lt;/code&gt; keyword at the end of the expression:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;expr &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- For example
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For details on sorting and collate index selection, see the &amp;ldquo;Sort Result Issues&amp;rdquo; section.&lt;/p&gt;

&lt;h2 class="relative group"&gt;MORE
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Concept Summary
 &lt;div id="concept-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL localization has three important concepts: character set, locale, and collation — it&amp;rsquo;s essential to understand their relationships.&lt;/p&gt;
&lt;p&gt;The server-side character set setting is very important: it can only be specified at initialization and database creation time, and cannot be modified after the database is created. The character set choice directly affects the encoding method. Collation does not, but there is a dependency between the two. Locale can likewise be specified at initialization, and among them, collation can be set at database creation time or individually on columns — note that these are merely defaults. Only when specifying collation at index creation does it affect the actual storage order. Different collations cannot use the same index, even if they share the same origin.&lt;/p&gt;
&lt;p&gt;Client character set and the four parameters (&lt;code&gt;LC_MESSAGES&lt;/code&gt;, etc.) are relatively simple — they can be modified directly via parameters and are unrelated to data storage.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c422851b81d.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Sort Result Issues
 &lt;div id="sort-result-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sort-result-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since UTF8 is the most common character set, we&amp;rsquo;ll test sorting with UTF-related collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_UTF8 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;UTF8&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- Create a UTF8 database; collation doesn&amp;#39;t matter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;use db_UTF8;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tzlz(name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ORDER BY results with different collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Order&lt;/th&gt;
 &lt;th&gt;default&lt;/th&gt;
 &lt;th&gt;C&lt;/th&gt;
 &lt;th&gt;en_US&lt;/th&gt;
 &lt;th&gt;en_US.utf8&lt;/th&gt;
 &lt;th&gt;zh_CN&lt;/th&gt;
 &lt;th&gt;zh_CN.utf8&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Here, &lt;code&gt;default&lt;/code&gt; is &lt;code&gt;en_US.utf8&lt;/code&gt; (column collation(default) → database collation(en_US.utf8))&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&amp;#x1f31f; &lt;strong&gt;C, en_US.utf8, and zh_CN.utf8 all produce different sort results!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Collate and index scan test:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_default &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_enUS_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using collate for index optimization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without any collate keyword, a simple index scan; no extra sorting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_default &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Adding collate conversion to the predicate hits the correct index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- However, the collation name must match exactly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- ORDER BY also needs the collate conversion expression
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Here, the correct index is used, but ORDER BY treats them as different collations (even though they are the same)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Adding collate conversion to both WHERE and ORDER BY selects the right index and avoids extra sorting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;After specifying a collation on an index, the SQL must explicitly use the COLLATE keyword to convert the expression. Even if the default is the same as the current collation, PostgreSQL will not use the index.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;LIKE not using index
 &lt;div id="like-not-using-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#like-not-using-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;The drawback of using locales other than &lt;code&gt;C&lt;/code&gt; or &lt;code&gt;POSIX&lt;/code&gt; in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by &lt;code&gt;LIKE&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;PostgreSQL&amp;rsquo;s own words: using non-C or non-POSIX prevents ordinary indexes from being used!&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PostgreSQL converts &lt;code&gt;LIKE&lt;/code&gt; to &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt; during index scans, where &lt;code&gt;&amp;lt;&lt;/code&gt; adds a &amp;ldquo;one step greater&amp;rdquo; value. This is where the problem lies: collation is strongly tied to sorting order. In ASCII, &lt;code&gt;a+1&lt;/code&gt; is &lt;code&gt;b&lt;/code&gt;, but what about Chinese characters?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;陿&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Sure enough, another Chinese character appears!&lt;/p&gt;
&lt;p&gt;If it&amp;rsquo;s a sequential scan, the &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt; won&amp;rsquo;t appear:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_c;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DROP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;170&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can create an index that is (claimed by the PostgreSQL docs to be) unrelated to collation rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; tzlz (name varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s look at its execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;~&amp;gt;=~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;~&amp;lt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;陿&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It still auto-generates the &amp;ldquo;one greater&amp;rdquo; string — this is definitely related to collation. It appears to be using C.&lt;/p&gt;
&lt;p&gt;So we can conclude:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When PostgreSQL uses a regular index for LIKE, it needs to convert it to &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt;, which requires a &amp;ldquo;one greater&amp;rdquo; value relative to the current string. Since collation is strongly tied to ordering, only an index using the same collation can guarantee data correctness. PostgreSQL chooses the non-localized C collation for this.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The quickest workaround is to create a C collation index or a pattern index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; tzlz (name varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For other adjustments to default collation at various levels, refer to the sections above.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Developers typically don&amp;rsquo;t specify collation when creating indexes. If it&amp;rsquo;s not C or pattern, LIKE won&amp;rsquo;t use the index. Combined with the common choice of the international character set UTF8, this leaves very few localization options in database operations. The recommended setup: character set UTF8, collation C.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://dbafix.com/what-is-the-impact-of-lc_ctype-on-a-postgresql-database/#:~:text=Having%20LC_CTYPE%20set%20to%20%E2%80%98C%E2%80%99%20implies%20that%20C,Postgres%20on%20top%20of%20these%20libc%20functions%2C%20they%E2%80%99re" target="_blank" rel="noreferrer"&gt;https://dbafix.com/what-is-the-impact-of-lc_ctype-on-a-postgresql-database/#:~:text=Having%20LC_CTYPE%20set%20to%20%E2%80%98C%E2%80%99%20implies%20that%20C,Postgres%20on%20top%20of%20these%20libc%20functions%2C%20they%E2%80%99re&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/charset.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/charset.html&lt;/a&gt;
&lt;a href="https://www.bookstack.cn/read/rds-best-pratice/bfc0037fe00d87dc.md" target="_blank" rel="noreferrer"&gt;https://www.bookstack.cn/read/rds-best-pratice/bfc0037fe00d87dc.md&lt;/a&gt;
&lt;a href="https://help.aliyun.com/zh/rds/apsaradb-rds-for-postgresql/configure-the-collation-of-a-database-on-an-apsaradb-rds-for-postgresql-instance" target="_blank" rel="noreferrer"&gt;https://help.aliyun.com/zh/rds/apsaradb-rds-for-postgresql/configure-the-collation-of-a-database-on-an-apsaradb-rds-for-postgresql-instance&lt;/a&gt;
&lt;a href="https://baike.baidu.com/item/%E7%BB%9F%E4%B8%80%E7%A0%81/2985798?fromModule=lemma_inlink&amp;amp;fromtitle=Unicode&amp;amp;fromid=750500" target="_blank" rel="noreferrer"&gt;https://baike.baidu.com/item/%E7%BB%9F%E4%B8%80%E7%A0%81/2985798?fromModule=lemma_inlink&amp;fromtitle=Unicode&amp;fromid=750500&lt;/a&gt;
&lt;a href="https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink" target="_blank" rel="noreferrer"&gt;https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/songyundong1993/article/details/128739919" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/songyundong1993/article/details/128739919&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original article (Chinese): &lt;a href="https://lastdba.com/2024/08/12/postgresql%E6%9C%AC%E5%9C%B0%E5%8C%96/" target="_blank" rel="noreferrer"&gt;PostgreSQL本地化&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>PostgreSQL Table Partitioning Deep Dive</title><link>https://lastdba.com/en/2024/08/12/postgresql-table-partitioning-deep-dive/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-table-partitioning-deep-dive/</guid><description>&lt;h2 class="relative group"&gt;What is a Partitioned Table
 &lt;div id="what-is-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/787a5ce076e9.png" alt="Postgres Table Partitioning" /&gt;
Database partitioning splits table data into smaller physical shards to improve performance, availability, and manageability. Partitioned tables are a common optimization technique for large tables in relational databases. DBMS generally provide partition management, and applications can access partitioned tables directly without changing their architecture—though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;Partitioned tables are common database technology, but PostgreSQL partitioned tables have many unique characteristics: multiple implementation approaches, partitions being regular tables, partition maintenance strategies, SQL optimization considerations, and some known issues.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;What is a Partitioned Table
 &lt;div id="what-is-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/787a5ce076e9.png" alt="Postgres Table Partitioning" /&gt;
Database partitioning splits table data into smaller physical shards to improve performance, availability, and manageability. Partitioned tables are a common optimization technique for large tables in relational databases. DBMS generally provide partition management, and applications can access partitioned tables directly without changing their architecture—though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;Partitioned tables are common database technology, but PostgreSQL partitioned tables have many unique characteristics: multiple implementation approaches, partitions being regular tables, partition maintenance strategies, SQL optimization considerations, and some known issues.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Partition Table Implementations
 &lt;div id="partition-table-implementations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-implementations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL provides various partition implementation approaches. The officially supported methods are declarative partitioning and inheritance partitioning, while third-party plugins include pg_pathman, pg_partman, etc. Since the introduction of official declarative partitioning, only one approach is generally recommended: declarative partitioning. Covering every implementation&amp;rsquo;s features, details, and history would make this article excessively long and is less relevant going forward. This article focuses mainly on declarative partitioning, with brief introductions to other approaches. However, due to existing deployments and feature differences, understanding declarative partitioning, inheritance partitioning, and pg_pathman remains valuable.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Declarative Partitioning
 &lt;div id="declarative-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#declarative-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Declarative partitioning, also called native partitioning, has been supported since PG10. It is the &amp;ldquo;officially supported&amp;rdquo; partitioning approach and the most recommended method. Although different from inheritance partitioning, declarative partitioning is also implemented internally using table inheritance. It supports only three partition methods: RANGE, LIST, and HASH.&lt;/p&gt;

&lt;h4 class="relative group"&gt;RANGE Partitioning
 &lt;div id="range-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#range-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f252d7be9e4d.png" alt="" /&gt;
RANGE partitioned tables split data by range, with partition boundaries defined as [t1, t2) (inclusive lower bound, exclusive upper bound).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id int,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_CREATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE(DATE_CREATED);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzlpartition1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(id,DATE_CREATED)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202301 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202302 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert some data into the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-28&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 minute&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;83521&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For RANGE partitioning, the FROM t1 TO t2 boundary uses the [t1, t2) convention: the lower bound is inclusive and the upper bound is exclusive.&lt;/p&gt;
&lt;p&gt;Inspecting the partitioned table shows that each partition is also an independent table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary keys, indexes, and NOT NULL/CHECK constraints are automatically created on partitions. Since partitions are independent tables, constraints and indexes can also be created on individual partitions. (ATTACH does not automatically create these — see the ATTACH section for details.)&lt;/p&gt;

&lt;h4 class="relative group"&gt;LIST Partitioning
 &lt;div id="list-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#list-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e3d094556a5d.png" alt="" /&gt;
LIST partitioning stores data in the corresponding partition based on specified partition key values.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id bigserial &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; population bigint
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; LIST (&lt;span style="color:#66d9ef"&gt;left&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;lower&lt;/span&gt;(name), &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities_ab
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; cities &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities_null
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; cities (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; city_id_nonzero &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (city_id &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; cities(name,population) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;Acity&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; cities(name,population) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; cities;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; population 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+---------+--------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cities_ab &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Acity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cities_null &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;LIST partitioned tables support creating a NULL partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;HASH Partitioning
 &lt;div id="hash-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hash-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e18df7edc15.png" alt="" /&gt;
HASH partitioning distributes data across partitions to spread out hot data evenly.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders (order_id int,name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;)) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; HASH (order_id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You cannot create a default partition, nor can you create more partitions than the specified MODULUS.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P16: remainder &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; hash partition must be &lt;span style="color:#66d9ef"&gt;less&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;than&lt;/span&gt; modulus
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: transformPartitionBound, parse_utilcmd.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3939&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p4 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P16: a hash&lt;span style="color:#f92672"&gt;-&lt;/span&gt;partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; may &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; partition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: transformPartitionBound, parse_utilcmd.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3909&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Insert data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;),&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3277&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3354&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+----------+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;HASH partition data is distributed evenly across partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert 100 NULL rows
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;)::text);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; order_id &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- All NULL data ends up on the remainder 0 partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; orders_p1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.orders_p1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-----------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (modulus &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, remainder &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although HASH partitioned tables have no concept of a NULL partition, they can store NULL data. NULL values are placed on the remainder 0 partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Multi-level (Mixed) Partitioning
 &lt;div id="multi-level-mixed-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multi-level-mixed-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Partitions can themselves be further partitioned, forming a cascading structure. Sub-partitions can use different partition methods — this is called mixed partitioning.



&lt;img src="https://lastdba.com/img/csdn/220e4e6f1544.png" alt="" /&gt;
Creating a mixed partition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_1000(id bigserial &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;),createddate &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; range(createddate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2001 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2002 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2003 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3001 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3002 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;def&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3003 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;jkl&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;\d+ only shows the immediate next-level partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; part_1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.part_1000&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;part_1000_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createddate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (createddate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_2002 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_2003 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; part_2001
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.part_2001&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;part_1000_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createddate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: part_1000 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((createddate &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (createddate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (createddate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: LIST (name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: part_3001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3002 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;def&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3003 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;jkl&amp;#39;&lt;/span&gt;) &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now insert a row:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 08:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; part_1000;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; createddate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+------+------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3001 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Data is stored in the lowest-level sub-partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Declarative Partitioning Feature Summary
 &lt;div id="declarative-partitioning-feature-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#declarative-partitioning-feature-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No INTERVAL partitioning&lt;/strong&gt;. There is no built-in automatic partition creation feature, which makes maintenance more cumbersome.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partitions themselves are tables&lt;/strong&gt;. This is a distinctive characteristic. This not only allows PostgreSQL to flexibly operate on sub-partitions but, more importantly, affects functionality and behavior.&lt;/li&gt;
&lt;li&gt;TRUNCATE, VACUUM, and ANALYZE on a partitioned table operate on all partitions. TRUNCATE ONLY cannot be executed on the parent table but can be executed on a child table containing data, clearing only that sub-partition.&lt;/li&gt;
&lt;li&gt;RANGE and HASH partition keys can have multiple columns; LIST partition keys can only be a single column or expression.&lt;/li&gt;
&lt;li&gt;The partitioned parent table itself is empty; only the lowest-level sub-partitions contain data.&lt;/li&gt;
&lt;li&gt;A DEFAULT partition receives data that falls outside declared ranges. Without a DEFAULT partition, inserting out-of-range data will raise an error.&lt;/li&gt;
&lt;li&gt;When adding a new partition, check whether the DEFAULT partition contains data belonging to the new partition.&lt;/li&gt;
&lt;li&gt;Partitions created via PARTITION OF automatically create indexes, constraints, and row-level triggers from the parent table.&lt;/li&gt;
&lt;li&gt;ATTACH does not handle any indexes, constraints, or other objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Inheritance Partitioning
 &lt;div id="inheritance-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inheritance-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Inheritance partitioning is also officially supported. It leverages PostgreSQL&amp;rsquo;s table inheritance feature to implement partitioning functionality. Inheritance partitioning is more flexible than declarative partitioning.
Implementing inheritance partitioning requires two PostgreSQL features: &lt;a href="https://www.postgresql.org/docs/current/ddl-inherit.html" target="_blank" rel="noreferrer"&gt;table inheritance&lt;/a&gt; and write redirection. Write redirection can be implemented via &lt;a href="https://www.postgresql.org/docs/current/rules.html" target="_blank" rel="noreferrer"&gt;rules&lt;/a&gt; or triggers.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Creating Inheritance Partition Tables
 &lt;div id="creating-inheritance-partition-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-inheritance-partition-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Example of creating inheritance partitioned tables:
&lt;strong&gt;1. Create the parent table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id int &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logdate date &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; peaktemp int,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; unitsales int
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Create child tables with CHECK constraints for partitioning ranges&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202308 (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;INHERITS&lt;/span&gt; (measurement);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202309 (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;INHERITS&lt;/span&gt; (measurement);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create rules or triggers to redirect inserted data to the corresponding child tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPLACE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;RETURNS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRIGGER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;IF&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; ) &lt;span style="color:#66d9ef"&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ELSIF&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; ) &lt;span style="color:#66d9ef"&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202309 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ELSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RAISE &lt;span style="color:#66d9ef"&gt;EXCEPTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Date out of range. Fix the measurement_insert_trigger() function!&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IF&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;RETURN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LANGUAGE&lt;/span&gt; plpgsql;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRIGGER&lt;/span&gt; insert_measurement_trigger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;BEFORE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EACH&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ROW&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A basic inheritance partitioned table is now set up.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; measurement
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.measurement&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+---------+-----------+----------+---------+---------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; peaktemp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; unitsales &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Triggers:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; insert_measurement_trigger &lt;span style="color:#66d9ef"&gt;BEFORE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EACH&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ROW&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: measurement_202308,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; measurement_202309
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test insertion and querying:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Inserting data outside the defined range raises an error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;, now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; ,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: P0001: Date &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; range. Fix the measurement_insert_trigger() &lt;span style="color:#66d9ef"&gt;function&lt;/span&gt;&lt;span style="color:#f92672"&gt;!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CONTEXT: PL&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgSQL &lt;span style="color:#66d9ef"&gt;function&lt;/span&gt; measurement_insert_trigger() line &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; RAISE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: exec_stmt_raise, pl_exec.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3889&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Inserting data is redirected to the child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,now(),&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Querying the parent table returns data from child tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; peaktemp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; unitsales 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------+---------+------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; measurement_202308 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;RULE vs. Trigger&lt;/strong&gt;
Besides triggers, PostgreSQL can also use rules to redirect inserts.
Example rule statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RULE&lt;/span&gt; measurement_insert_202308 &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSTEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RULE&lt;/span&gt; measurement_insert_202309 &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSTEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202309 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Differences between rules and triggers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rules have worse performance than triggers in general, but for bulk inserts rules perform better since they only check once. In all other cases, triggers are preferable.&lt;/li&gt;
&lt;li&gt;COPY does not fire rules but does fire triggers. When using rules, data can be COPY&amp;rsquo;d directly into child tables.&lt;/li&gt;
&lt;li&gt;When inserting data outside defined ranges, rules will insert into the parent table, while triggers will raise an error.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Indexes&lt;/strong&gt;
To improve performance, you also need to create indexes and enable constraint_exclusion. Indexes on partitions are generally essential. For inheritance tables, indexes must be manually created on child tables.
Example of creating indexes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_measurement_202308_logdate &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement_202308 (logdate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_measurement_202309_logdate &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement_202309 (logdate);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Insert some data and check the execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- &amp;#39;2023-08-04&amp;#39; has only 1 row, allowing it to use the index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,now()&lt;span style="color:#f92672"&gt;+&lt;/span&gt;interval &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;),&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;),now(),&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; logdate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement measurement_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;::date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_measurement_202308_logdate &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202308 measurement_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;::date)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In the above execution plan, the August partition uses the index on the partition. Since constraint_exclusion is enabled by default for inheritance tables, the September partition was excluded and only August was scanned. However, because the parent table has no constraints (and cannot have them), it always appears in the execution plan—but since the parent table is generally empty, this has minimal impact.&lt;/p&gt;

&lt;h4 class="relative group"&gt;constraint_exclusion
 &lt;div id="constraint_exclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#constraint_exclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;constraint_exclusion controls whether the optimizer uses constraints to reduce unnecessary table access. This parameter is commonly used in inheritance partitioning optimization — by reducing child table access, it improves SQL performance. (This functionality is similar to the enable_partition_pruning parameter, which controls partition pruning for declarative partitioned tables.) constraint_exclusion has three values:
&lt;code&gt;on&lt;/code&gt;: All tables are checked for constraints.
&lt;code&gt;partition&lt;/code&gt;: Inheritance tables and UNION ALL subqueries are checked for constraints (default).
&lt;code&gt;off&lt;/code&gt;: Constraints are not checked.
Constraint exclusion only occurs during execution plan generation, not during actual execution (partition pruning can occur during execution). This means constraint exclusion does not happen when using bound parameters or variable values.
For example, when using functions like now() whose specific value the optimizer cannot determine, the optimizer cannot exclude partitions that don&amp;rsquo;t need to be accessed at all:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; now 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;772658&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The optimizer did not exclude the September partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; logdate&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1628&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement measurement_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202308 measurement_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1010&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202309 measurement_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;617&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_measurement_202309_logdate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;617&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, constraint exclusion itself needs to check all child table constraints. If there are too many child table constraints, the efficiency of generating execution plans will be affected. Therefore, inheritance partitioning is not recommended for creating too many child partitions.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Adding/Removing Partitions in Inheritance Partitioning
 &lt;div id="addingremoving-partitions-in-inheritance-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#addingremoving-partitions-in-inheritance-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;To turn an inherited partition into a regular table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;NO&lt;/span&gt; INHERIT measurement;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To add an existing regular table (with data) as a child table in the inheritance partition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; measurement_202310_logdate_check 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-11-01&amp;#39;&lt;/span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--insert into measurement_202310 values(2001,&amp;#39;20231010&amp;#39;,3,3);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 INHERIT measurement;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Inheritance Partitioning Feature Summary
 &lt;div id="inheritance-partitioning-feature-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inheritance-partitioning-feature-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Inheritance partitioning is more flexible than declarative partitioning, but some declarative partitioning features are unavailable.&lt;/li&gt;
&lt;li&gt;Child tables inherit parent table constraints, so global constraints should not be set on the parent table.&lt;/li&gt;
&lt;li&gt;Indexes are not inherited; they must be created individually on each child table.&lt;/li&gt;
&lt;li&gt;Declarative partitioning only supports RANGE, LIST, and HASH partitions. Inheritance partitioning can support more, including custom partitioning methods.&lt;/li&gt;
&lt;li&gt;Dropping a child table does not invalidate the trigger. PostgreSQL does not have Oracle&amp;rsquo;s concept of invalidated objects (indexes do have an invalidation concept).&lt;/li&gt;
&lt;li&gt;Generally, using triggers for insert redirection is more efficient than rules.&lt;/li&gt;
&lt;li&gt;When adding a new partition, if the trigger function lacks a rule for that partition, the trigger function needs to be updated.&lt;/li&gt;
&lt;li&gt;Inheritance partitioning supports multiple inheritance.&lt;/li&gt;
&lt;li&gt;Constraint exclusion cannot occur during execution; using fixed values for queries is recommended.&lt;/li&gt;
&lt;li&gt;With inheritance partitioning, avoid creating too many child partitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;pg_pathman
 &lt;div id="pg_pathman" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_pathman" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pg_pathman is a third-party plugin implementing partitioning functionality. The &lt;a href="https://github.com/postgrespro/pg_pathman" target="_blank" rel="noreferrer"&gt;pg_pathman README on GitHub&lt;/a&gt; and &lt;a href="https://developer.aliyun.com/article/62314" target="_blank" rel="noreferrer"&gt;articles on using pg_pathman&lt;/a&gt; already describe pathman in great detail. Here we only highlight key points and do some simple testing.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_pathman Basics
 &lt;div id="pg_pathman-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_pathman-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;No Longer Maintained&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;NOTE: this project is not under development anymore&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;pg_pathman supports PostgreSQL 9.5 through 15. Later PostgreSQL versions are no longer supported, and existing versions only receive bug fixes — no new features will be added.
pg_pathman emerged because older PostgreSQL versions had incomplete partitioning features. Now that native partitioned tables (declarative partitioning) are very mature, pg_pathman also recommends using native partitioned tables. Existing pg_pathman partitioned tables are also recommended to be migrated to native partitioned tables. pg_pathman, once recognized by many users, is now history. Even though it&amp;rsquo;s no longer updated, its feature set is still richer than the current native partitioned tables.
&lt;strong&gt;Feature Highlights&lt;/strong&gt;
pg_pathman is quite powerful, supporting some features that native partitioned tables do not. However, pathman is not perfect either and has many issues in practice. Key points to note about pg_pathman include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_pathman can manage partitions through partition management functions. It supports replace, merge, split partition operations; attach and detach operations; and INTERVAL partitioning.&lt;/li&gt;
&lt;li&gt;pg_pathman has many optimizations for partitioned table execution plans.&lt;/li&gt;
&lt;li&gt;pg_pathman only supports RANGE and HASH partition types.&lt;/li&gt;
&lt;li&gt;The pathman_config table stores partition configuration information; it provides partition task views.&lt;/li&gt;
&lt;li&gt;Partition information is cached in memory for execution plan generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Basic pg_pathman Usage
 &lt;div id="basic-pg_pathman-usage" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-pg_pathman-usage" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Creating pathman RANGE partitions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The regular table serves as the parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; journal (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id SERIAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#66d9ef"&gt;TIMESTAMP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; INTEGER,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg TEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Indexes on the parent table are automatically created on child partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; journal(dt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create_range_partitions(&lt;span style="color:#e6db74"&gt;&amp;#39;journal&amp;#39;&lt;/span&gt;::regclass, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;dt&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;1 month&amp;#39;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) ; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View table definition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; journal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.journal&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;journal_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;journal_dt_idx&amp;#34;&lt;/span&gt; btree (dt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: journal_1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_2,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_3,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_4,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_5,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; journal_6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.journal_6&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;journal_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;journal_6_dt_idx&amp;#34;&lt;/span&gt; btree (dt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pathman_journal_6_check&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (dt &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; dt &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Inherits&lt;/span&gt;: journal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; journal (dt, &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt;, msg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;, random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-28&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 hour&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data for which no corresponding partition has been created yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; journal (dt, &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt;, msg) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01&amp;#39;&lt;/span&gt;::date,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check partition data distribution; the INTERVAL partition has been automatically created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; journal &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_7 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;649&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;744&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View execution plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Partition pruning has occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; journal &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; dt&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; journal journal_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (dt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; journal_1_dt_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; journal_1 journal_1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (dt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Creating pathman HASH partitions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; items (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id SERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name TEXT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code BIGINT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create HASH partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; create_hash_partitions(&lt;span style="color:#e6db74"&gt;&amp;#39;items&amp;#39;&lt;/span&gt;::regclass, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) ; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; items (id, name, code)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text), random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; items &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;344&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;338&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; items
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.items&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+-----------+----------+-----------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;items_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;items_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: items_0,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; items_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.items_1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+-----------+----------+-----------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;items_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;items_1_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pathman_items_1_check&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (get_hash_part_idx(hashint4(id), &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Inherits&lt;/span&gt;: items
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; items &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;344&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;338&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Pros and Cons of PostgreSQL Partitioned Tables
 &lt;div id="pros-and-cons-of-postgresql-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pros-and-cons-of-postgresql-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Advantages of Partitioned Tables
 &lt;div id="advantages-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#advantages-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;SQL performance improvement. In certain scenarios, such as splitting a large amount of data into multiple partitions where SQL only needs to query one partition, SQL performance can be dramatically improved.&lt;/li&gt;
&lt;li&gt;Partitions can work together with indexes. For example, accessing an index on a single partition is more efficient than accessing a large unpartitioned index.&lt;/li&gt;
&lt;li&gt;Dropping a single partition is much more efficient than deleting many rows. This is common in time-range partitioning — dropping an unused historical partition is very fast, but without partitioning, DELETE operations are not only slow but also require additional maintenance.&lt;/li&gt;
&lt;li&gt;VACUUM is faster. Reclaiming old version information or collecting statistics on a large table is very slow. If VACUUM hasn&amp;rsquo;t finished executing, SQL may already be experiencing problems. With partitioning, VACUUM becomes much faster.&lt;/li&gt;
&lt;li&gt;I/O distribution capability. Different partitions can be placed on different paths or different disks. Rarely-used data can be placed on cheaper disks.&lt;/li&gt;
&lt;li&gt;More maintenance techniques. Directly maintaining a very large table is difficult — for example, VACUUM on an extremely large table has many issues. With partitioned tables, each partition can run VACUUM independently. Moreover, ATTACH/DETACH, local indexes/constraints, and more can be flexibly used in many scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Disadvantages of Partitioned Tables
 &lt;div id="disadvantages-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#disadvantages-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;In PostgreSQL, every partition of a partitioned table can be treated as a regular table. Too many partitions can lead to longer SQL parsing times and higher memory load, even causing errors. See the previous article: &lt;a href="https://editor.csdn.net/md/?articleId=131497779" target="_blank" rel="noreferrer"&gt;Too many range table entries even with a modest number of partitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Even if having too many partitions doesn&amp;rsquo;t cause errors, and partition pruning doesn&amp;rsquo;t happen during execution plan generation (it might happen during execution), the EXPLAIN output will be extremely long. At that point, the logs will also contain lengthy execution plans, affecting log readability.&lt;/li&gt;
&lt;li&gt;Some strange issues: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;Different users see different execution plans&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Limitations of Partitioned Tables
 &lt;div id="limitations-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#limitations-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;No native automatic partition creation feature&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Only local partition indexes are supported; global indexes are not supported&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Primary keys must include the partition key. PostgreSQL currently can only enforce uniqueness within each partition, hence this limitation. Oracle and MySQL do not have this restriction.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unique indexes must include the partition key. PostgreSQL currently can only enforce uniqueness within each partition. Same applies to primary keys.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cannot create globally-defined constraints&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;BEFORE ROW INSERT triggers cannot update the partition into which the row is being inserted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Temporary table partitions and regular table partitions cannot coexist under the same partitioned table.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In declarative partitioning, parent and child table columns must be identical; in inheritance partitioning, child tables can have more columns than the parent table.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In declarative partitioning, CHECK and NOT NULL constraints are always inherited; these two constraints cannot be set independently on individual partitions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RANGE partitions cannot store NULL values. HASH partitions have no concept of NULL partitions but can store NULL values — they are placed on the remainder 0 partition. LIST partitions can explicitly create a NULL partition to store NULL data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;When Should You Use Partitioned Tables?
 &lt;div id="when-should-you-use-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#when-should-you-use-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, to use partitioned tables you must understand the advantages, disadvantages, and limitations they bring. For example, when data volume is large, partitioning can improve performance; hot/cold data separation also makes partition data management easier. You should decide whether to partition and how to partition based on your specific business situation and hardware resources. However, developers will always ask questions like &amp;ldquo;how much data warrants partitioning.&amp;rdquo; Advice on using partitioned tables can only be given in general terms. If you don&amp;rsquo;t know how to partition, you can refer to the following recommendations (if you already have sufficient understanding of table partitioning, please ignore):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The table data is large enough, and SQL queries on the table always or can include the partition key column.&lt;/li&gt;
&lt;li&gt;Clear hot/cold data separation. For example, new data is always inserted into the current month&amp;rsquo;s partition, while the other 11 months of old partitions are read-only.&lt;/li&gt;
&lt;li&gt;VACUUM can no longer keep up.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partition Table Permissions
 &lt;div id="partition-table-permissions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-permissions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Permission issues are less discussed in the context of partitioned table knowledge, but they are still worth paying attention to.
Because PostgreSQL has the concept that &amp;ldquo;partition child tables are also regular tables,&amp;rdquo; this differs from other common databases (Oracle, MySQL). For example, in Oracle you don&amp;rsquo;t need to worry about partition child table permissions, but in PostgreSQL you do.&lt;/p&gt;
&lt;p&gt;PARTITION OF / ATTACH do not inherit the parent table&amp;rsquo;s permissions to child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Grant SELECT on the partitioned table to a regular user
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; userlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check permissions — only the parent table has been granted; existing partition child tables are not automatically granted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; grantee,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,privilege_type &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.table_privileges &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; grantee&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;userlzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grantee &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; privilege_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------+---------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; userlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Create a partition using PARTITION OF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202303 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a partition using ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202304 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check permissions again — newly created child partitions are not automatically granted to other users (but permissions are automatically granted to the owner)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; grantee,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,privilege_type &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.table_privileges &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; grantee&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;userlzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grantee &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; privilege_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------+---------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; userlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, user &lt;code&gt;userlzl&lt;/code&gt; has no access permissions to any child tables, but has permissions on the parent table.
&lt;code&gt;userlzl&lt;/code&gt; can access partition data through the parent table, but cannot access data by directly querying child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; userlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;userlzl&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2159&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d05d716da126ff4b44d934344cc4dd7a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1_202301 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42501&lt;/span&gt;: permission denied &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: aclcheck_error, aclchk.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3466&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since ATTACH/DETACH does not handle permissions, if we DETACH a partition at this point, that partition will also be inaccessible to &lt;code&gt;userlzl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202303;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202303;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----------------------+-------+-------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1_202301 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42501&lt;/span&gt;: permission denied &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this we can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partition child tables and the parent table exist as regular tables in PostgreSQL, each with their own permission system.&lt;/li&gt;
&lt;li&gt;If you lack child table permissions but have parent table permissions, you can still access child table data.&lt;/li&gt;
&lt;li&gt;PARTITION OF, ATTACH, and DETACH do not handle permission issues.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, partition table permissions do not merely control whether access is possible. Lacking partition child table permissions can lead to abnormal execution plans. Reference article: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;Different users see different execution plans&lt;/a&gt;
This issue is an intermittent phenomenon that causes superusers and regular users to see different SQL execution plans. The actual business SQL execution plan is abnormal but goes unnoticed, making it difficult to diagnose. Partition child tables have their own statistics, and child table permissions are inconsistent with the parent table (even for partitions created via PARTITION OF), resulting in users being able to access child table data through the parent table but unable to view the child table&amp;rsquo;s statistics. This permission issue leads to differences in execution plans.
This contradicts the general concept that &amp;ldquo;&lt;em&gt;permissions only control whether you can access a table, not how you access it&lt;/em&gt;,&amp;rdquo; so attention must be paid to this permission issue.
To provide permission for child table statistics, it is recommended to explicitly grant SELECT on all child tables to the user, which avoids the issues above:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_partition_allname &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; username;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Partition Table Maintenance
 &lt;div id="partition-table-maintenance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-maintenance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;ATTACH/DETACH Basic Operations
 &lt;div id="attachdetach-basic-operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#attachdetach-basic-operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ATTACH/DETACH can add/detach an existing table as a partition of/detach from a partitioned table. ATTACH/DETACH is very useful in maintenance work.
First, let&amp;rsquo;s look at the locking behavior of adding partitions via &amp;ldquo;CREATE TABLE &amp;hellip; PARTITION OF&amp;rdquo; and deleting partitions via &amp;ldquo;DROP TABLE&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;Lock Matrix: &lt;a href="https://www.postgresql.org/docs/current/explicit-locking.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/explicit-locking.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Lock Requests: &lt;a href="https://postgres-locks.husseinnasser.com" target="_blank" rel="noreferrer"&gt;https://postgres-locks.husseinnasser.com&lt;/a&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Adding a partition via PARTITION OF&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a transaction, read-only data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8249&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;256&lt;/span&gt;ac66bb53d31bc6124294238d6410c &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status. When reading data from one partition, locks are acquired on both the child partition and the parent table.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+-----------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Add a partition via PARTITION OF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202305 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks again
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- This is the PARTITION OF session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Run an arbitrary query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Check locks again
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84774&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- Query is blocked&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When adding a partition via PARTITION OF, an AccessExclusiveLock is requested on the parent table. This waits for all transactions on the parent table and also blocks all transactions on the parent table.



&lt;img src="https://lastdba.com/img/csdn/851906be0f93.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;Although the PARTITION OF statement itself executes quickly, if there are long-running transactions on the parent table, all operations on the partitioned table will stall for an extended period. Without a maintenance window, using PARTITION OF to add partitions directly is not recommended.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Dropping a partition via DROP TABLE&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start another read-only transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Drop a child partition of the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202305;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dropping a child partition with DROP TABLE requests an AccessExclusiveLock on the parent table, waiting for all and blocking all. Similarly, this must be used with caution in production environments.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;strong&gt;ATTACH — adding a partition&lt;/strong&gt;
ATTACH attaches an existing regular table to a partitioned table.
Although both ATTACH and PARTITION OF can add partitions, note that &lt;strong&gt;ATTACH does not automatically create indexes, constraints, default values, or row-level triggers&lt;/strong&gt; — this differs from PARTITION OF.
First, create a table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- To reduce tedious DDL, use LIKE to create the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now observe whether ATTACH is blocked:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a read-write transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DML statements acquire RowExclusiveLock on the partition parent table and the corresponding partition child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: ATTACH the newly created table to the partition parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202305 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ATTACH only requests a SHARE UPDATE EXCLUSIVE lock, which is much lighter than ACCESS EXCLUSIVE.



&lt;img src="https://lastdba.com/img/csdn/b23cc350250f.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;ATTACH does not block reads or writes, so ATTACH is recommended for adding partitions — it does not affect business operations and can be executed online.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;&lt;strong&gt;DETACH — removing a partition&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DETACH removes a partition from the partitioned table, turning it into a regular table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Keep the DML transaction uncommitted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH a partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202305;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Unlike ATTACH, DETACH requests an AccessExclusiveLock on the parent table, waiting for all and blocking all.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DETACH CONCURRENTLY&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Starting from PostgreSQL 14, DETACH gained two new syntax variants: CONCURRENTLY and FINALIZE.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;&lt;p&gt;ALTER TABLE [ IF EXISTS ] &lt;em&gt;&lt;code&gt;name&lt;/code&gt;&lt;/em&gt;
DETACH PARTITION &lt;em&gt;&lt;code&gt;partition_name&lt;/code&gt;&lt;/em&gt; [ CONCURRENTLY | FINALIZE ]&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;DETACH CONCURRENTLY internally starts two transactions. The first transaction requests a SHARE UPDATE EXCLUSIVE lock on both the parent and child tables, marking the partition as being in a detaching state, at which point it waits for all transactions on the partitioned table to commit. Once all those transactions have committed, the second transaction requests a SHARE UPDATE EXCLUSIVE lock on the parent table and an ACCESS EXCLUSIVE lock on that child table, after which DETACH CONCURRENTLY completes.&lt;/p&gt;
&lt;p&gt;Additionally, after DETACH CONCURRENTLY, the detached child table retains its constraint — the partition constraint is converted into a CHECK constraint on the detached table.&lt;/p&gt;
&lt;p&gt;DETACH CONCURRENTLY limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DETACH CONCURRENTLY cannot be placed inside a transaction block.&lt;/li&gt;
&lt;li&gt;The partitioned table cannot have a DEFAULT partition.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Locking behavior of CONCURRENTLY:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH CONCURRENTLY
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;); &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3947&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,query,wait_event_type,wait_event &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The DETACH session is 3940. Interestingly, the DETACH wait event is virtualxid, and the wait event type is Lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock details
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;,relation,virtualtransaction,pid,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------------------+------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16387&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40969&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16387&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40963&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- At this point, DETACH is not yet waiting for a table-level lock; it is waiting for a ShareLock on virtualxid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Try an insert
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relation &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;found&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; the failing &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;contains&lt;/span&gt; (date_created) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The detaching partition can no longer accept inserts, but other partitions can.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- What if we insert directly into the partition? It works fine.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Note: at this point it is still a partition of the partitioned table, not yet a regular table, but it has been marked as unavailable.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- \d+ shows the partition in DETACH PENDING state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) (DETACH PENDING),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Commit/rollback the insert session (Session 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2 completes immediately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;FINALIZE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH CONCURRENTLY, manually canceled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;^&lt;/span&gt;CCancel request sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; request
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- \d+ shows the partition in DETACH PENDING state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) (DETACH PENDING),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- In DETACH PENDING state, SQL no longer accesses this partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;752&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;81&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;38881&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use FINALIZE to complete the detach
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 finalize; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+------+--------------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareUpdateExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- 3940, FINALIZE requests ShareUpdateExclusiveLock on the parent table and AccessExclusiveLock on the child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Since the inserted data happened to be in the detaching partition, it is waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 ends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=!&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2 completes immediately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 finalize; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although DETACH requests an 8-level lock on the partition, generally business operations don&amp;rsquo;t write directly through child partitions, so you only need to ensure that long-running transactions on the partitioned table complete quickly. Usually, there&amp;rsquo;s no need to worry about subsequent blocking on that partition&amp;rsquo;s child table.&lt;/p&gt;
&lt;p&gt;Online DETACH summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The blocking behavior of DETACH CONCURRENTLY is somewhat similar to CIC (CREATE INDEX CONCURRENTLY) — it does not block other transactions, but it itself waits for existing transactions to complete. This is not easily visible from lock information alone.&lt;/li&gt;
&lt;li&gt;During DETACH CONCURRENTLY, the partition enters a DETACH PENDING intermediate state. This state is somewhat like INVISIBLE — SQL will not find this partition.&lt;/li&gt;
&lt;li&gt;If DETACH PENDING is caused by long-running transactions, promptly end those transactions; if it&amp;rsquo;s caused by interruption, use FINALIZE to complete the detach.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Using Constraints to Reduce ATTACH Time
 &lt;div id="using-constraints-to-reduce-attach-time" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-constraints-to-reduce-attach-time" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Partition data overview — prepare to ATTACH a relatively large partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2592001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38881&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: this 202301 partition has a PARTITION CONSTRAINT:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;DETACH the partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After DETACH, the PARTITION CONSTRAINT is gone
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;ATTACH without adding a CHECK constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;343&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because it must scan the partition data to verify it satisfies the partition range, ATTACH took 300+ ms.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Add a CHECK constraint first, then ATTACH:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202301 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;355&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;458&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The time taken to add the CHECK constraint is roughly the same as the ATTACH operation without a CHECK — because adding a CHECK constraint also needs to scan and validate all data.
Once the CHECK constraint is added, the subsequent ATTACH completes very quickly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;480&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Drop the CHECK constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;chk_202301&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: CHECK CONSTRAINT and PARTITION CONSTRAINT are different concepts, even though their constraint content can be identical. ATTACH uses the CHECK constraint but does not merge it. You can explicitly drop this redundant CHECK:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, note that DROP CONSTRAINT requests an AccessExclusiveLock on the current child partition — this is the highest-level lock and blocks all operations. So, if there are transactions on that child partition, be cautious with DROP CONSTRAINT.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- This is the DROP CONSTRAINT session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So,
&lt;strong&gt;When ATTACH-ing a partition, adding a CHECK constraint beforehand is useful — it reduces ATTACH execution time. The data validation just needs to be completed before ATTACH.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Correct Way to Add Partitions to a Partitioned Table
 &lt;div id="the-correct-way-to-add-partitions-to-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-correct-way-to-add-partitions-to-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We now know that ATTACH can be executed online, while PARTITION OF / DROP TABLE / DETACH all request an AccessExclusiveLock that waits for and blocks everything.
So,
&lt;strong&gt;It is recommended to use ATTACH to create new partitions. PARTITION OF / DETACH both wait for and block all transactions, while ATTACH is not blocked by read-only/DML transactions.&lt;/strong&gt;
Therefore, adding partitions should use ATTACH, and a CHECK constraint should be created beforehand. When dropping constraints, be mindful of long-running transactions.
&lt;strong&gt;The correct way to add a partition to a partitioned table&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- To reduce tedious DDL, use LIKE to create the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Refer to the PARTITION CONSTRAINT of other partitions, add a CHECK constraint on the table to reduce ATTACH constraint validation time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add partition using ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Optional. Drop the redundant CHECK constraint before transactions start on the new partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Locks on Partition Indexes
 &lt;div id="locks-on-partition-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locks-on-partition-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Creating/dropping partition indexes during read-only transactions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When a partition has a shared lock (AccessShareLock), meaning there is a query transaction on the partitioned table:
CREATE INDEX ON lzlpartition1 succeeds (note: without CONCURRENTLY); DROP INDEX lzlpartition1 fails:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a transaction, read data from the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;86401&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index, succeeds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Drop index, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CREATE INDEX does not request an AccessExclusiveLock on the table, but DROP INDEX does.
From this example we can conclude:
&lt;strong&gt;Read-only transactions do not block CREATE INDEX, but they do block DROP INDEX.&lt;/strong&gt;&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Creating/dropping partition indexes during update transactions&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start an update transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create partition index, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The CREATE INDEX session (99598) requests a ShareLock on the partition parent table; the DML transaction session (300371) holds RowExclusiveLock on the child partition and parent table.



&lt;img src="https://lastdba.com/img/csdn/9fc4b97314bd.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;CREATE INDEX (without CONCURRENTLY) requests ShareLock on the parent table;
Read-only transactions request AccessShareLock on the parent and child tables;
Update transactions request RowExclusiveLock on the parent and child tables;
==&amp;gt;
AccessShareLock does not block ShareLock, so queries do not block CREATE INDEX (without CONCURRENTLY);
RowExclusiveLock blocks ShareLock, so DML blocks CREATE INDEX (without CONCURRENTLY);&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Creating partitioned indexes with CONCURRENTLY&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Note: You cannot create indexes with CONCURRENTLY on a partitioned table.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: cannot &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1&amp;#34;&lt;/span&gt; concurrently
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: DefineIndex, indexcmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;665&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There is a patch at &lt;a href="https://commitfest.postgresql.org/35/2815/" target="_blank" rel="noreferrer"&gt;https://commitfest.postgresql.org/35/2815/&lt;/a&gt; working on solving this issue.&lt;/p&gt;
&lt;p&gt;Currently, you can create indexes with CONCURRENTLY on individual partition child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Still using the previous DML transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index with CONCURRENTLY on a child table, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202301 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+--------------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareUpdateExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With CONCURRENTLY, the requested lock is one level lower and &lt;strong&gt;no longer conflicts&lt;/strong&gt; with ROW EXCL. The locks don&amp;rsquo;t conflict, so why is CONCURRENTLY itself still blocked?&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;it must wait for all existing transactions that could potentially modify or use the index to terminate.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The official documentation explains that CONCURRENTLY needs to wait for transactions that could potentially modify or use the index to terminate. In our case, the UPDATE statement modified the indexed column, so CONCURRENTLY needs to wait for it to complete.
&lt;strong&gt;Although CONCURRENTLY itself hasn&amp;rsquo;t completed due to the prior DML statement, there&amp;rsquo;s a benefit: CONCURRENTLY does not block subsequent DML statements.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- While CONCURRENTLY has not yet completed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Update a record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 12:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary of partition index locking issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Locking for read-only/read-write/index creation on partitioned tables is similar to regular tables. Just note that transactions acquire locks on both the partition parent table and child tables, so when subsequent blocking chains involve heavier locks, all partitions are affected.&lt;/li&gt;
&lt;li&gt;Read-only transactions do not block CREATE INDEX, but they do block DROP INDEX.&lt;/li&gt;
&lt;li&gt;DML blocks CREATE INDEX and also blocks CREATE INDEX CONCURRENTLY, but CONCURRENTLY does not block DML.&lt;/li&gt;
&lt;li&gt;Although CREATE INDEX on a partitioned table automatically creates indexes on all existing and future partitions, it is not recommended for direct use in production due to blocking issues.&lt;/li&gt;
&lt;li&gt;You cannot use CONCURRENTLY directly on the partition parent table, so you need to create indexes with CONCURRENTLY on each partition child table.&lt;/li&gt;
&lt;li&gt;CONCURRENTLY does not block subsequent transactions but itself gets blocked by prior long-running transactions and may cause the created index to be invalid. Attention must be paid to long-running transactions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;The Correct Way to Create Partition Indexes
 &lt;div id="the-correct-way-to-create-partition-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-correct-way-to-create-partition-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although you cannot create indexes with CONCURRENTLY on a partitioned table, you can create indexes with CONCURRENTLY on partition child tables using the following syntax:
&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; : Creates an invalid index on the parent table; does not automatically create indexes on child partitions.
&lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; : Creates an index with CONCURRENTLY on a child partition.
&lt;code&gt;ALTER INDEX .. ATTACH PARTITION&lt;/code&gt; : Attaches the partition index to the parent index. After all child partition indexes have been attached, the partition parent table index is automatically marked as valid.
However, when executing these commands, you still need to pay attention to locking behavior.&lt;/p&gt;
&lt;p&gt;Below, observe the lock requests and blocking for the above two statements:
(DML explicit transaction in Session 1 is kept open throughout)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Blocking behavior of CREATE INDEX ON ONLY:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CREATE INDEX ON ONLY requests a ShareLock. ShareLock and RowExclusiveLock block each other. So, although ONLY itself executes very quickly, CREATE INDEX ON ONLY should not be used casually either.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After the DML transaction ends, CREATE INDEX ON ONLY completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_datecreated&amp;#34;&lt;/span&gt; btree (date_created) INVALID&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; creates an invalid index on the partition parent table and does not create indexes on child partitions.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Blocking behavior of ATTACH index:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After ONLY index creation completes, start another DML explicit transaction in Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1111&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index with CONCURRENTLY on child partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- 202302 partition index created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202304 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- 202304 partition index created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202301 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---- Creating 202301 partition index, waiting&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CONCURRENTLY waits for transactions that might use the index to complete. Our explicit transaction only inserted into the 202301 partition, so only this partition&amp;rsquo;s CONCURRENTLY index creation hasn&amp;rsquo;t completed.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Complete the DML explicit transaction in Session 1, wait for the index to finish, then start another transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1111&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:01&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: ATTACH index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; idx_datecreated
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.idx_datecreated&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Definition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+------+--------------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; yes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;btree, &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;, invalid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: idx_datecreated_202302 &lt;span style="color:#75715e"&gt;-- 202302 child partition index has been attached, index still invalid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: btree
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attach the remaining child partition indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202304;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After all child partition indexes are attached, the parent table index automatically becomes valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; idx_datecreated
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.idx_datecreated&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Definition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+------+--------------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; yes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;btree, &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: idx_datecreated_202301,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_datecreated_202302,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_datecreated_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: btree&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ATTACH is not blocked by DML and completes immediately. At this point, new partitions created via PARTITION OF will also automatically get the child partition index.&lt;/p&gt;
&lt;p&gt;In summary,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; requests a &lt;code&gt;ShareLock&lt;/code&gt;, which mutually blocks with the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; requests a &lt;code&gt;ShareUpdateExclusiveLock&lt;/code&gt;, which does not block the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML. However, &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; needs to wait for DML transactions to complete before it can finish (CONCURRENTLY can acquire the lock but cannot complete).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ALTER INDEX .. ATTACH PARTITION&lt;/code&gt; requests an &lt;code&gt;AccessShareLock&lt;/code&gt;, which is the lightest lock and does not block the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML.&lt;/li&gt;
&lt;li&gt;Queries request &lt;code&gt;AccessShareLock&lt;/code&gt;, the lightest lock. Unless DDL requests &lt;code&gt;AccessExclusiveLock&lt;/code&gt; (the heaviest lock), blocking does not occur.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, directly running CREATE INDEX on a partition blocks DML and is not acceptable.
&lt;strong&gt;The correct way to create partition indexes&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use ONLY to create an invalid index on the partition parent table. Fast, but blocks subsequent DML, affects business — watch for long-running transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use CONCURRENTLY to create indexes on each partition child table. Slow, does not block subsequent DML, does not affect business, but watch for long-running DML transactions to prevent failure.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- ATTACH all indexes. Fast, does not cause business blocking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Adding Primary Keys and Unique Indexes to Partitioned Tables
 &lt;div id="adding-primary-keys-and-unique-indexes-to-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#adding-primary-keys-and-unique-indexes-to-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A &amp;ldquo;primary key index&amp;rdquo; is functionally equivalent to &amp;ldquo;unique index + NOT NULL constraint&amp;rdquo; (but there can only be one primary key). Creating unique indexes on partitioned tables can follow the index creation best practices above: ONLY on parent, CONCURRENTLY on children, ATTACH.
However, while primary keys on regular tables support the USING INDEX syntax, partitioned tables currently do not support this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; pk_id_date_created &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_uniq;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; supported &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; partitioned tables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ATExecAddIndexConstraint, tablecmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8032&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In other words, you can create a NOT NULL unique index by pre-creating a NOT NULL constraint + ATTACH-ing indexes, but the final step of USING INDEX to add the primary key does not work.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s look at the blocking behavior of directly adding/dropping primary keys:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Directly dropping a primary key:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;318&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7715&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; beee680a86e1d12790489e9ab4a4351b &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Session 2: Drop primary key, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; lzlpartition1_pkey;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Session 3: Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+-------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dropping a primary key requests an AccessExclusiveLock, blocking everything.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Directly adding a primary key:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 transaction ends; Session 2&amp;#39;s drop primary key completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 starts another read-only transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Add a primary key on the partitioned table, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;(id, date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Observe locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+-------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- Session adding primary key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Adding a primary key requests an AccessExclusiveLock on the parent table, blocking everything.
Adding an index on a partitioned table is very slow, and a primary key causes subsequent blocking. Currently, there is no low-impact way to add a primary key on a partitioned table. As a workaround, you can consider using the &amp;ldquo;ATTACH unique index + NOT NULL constraint&amp;rdquo; approach; or you may have to schedule a long maintenance window for the partitioned table business and wait for index creation to complete; or use a third-party sync tool to insert data into a partitioned table that already has the primary key.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Adding Partitions to HASH Partitioned Tables
 &lt;div id="adding-partitions-to-hash-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#adding-partitions-to-hash-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If the new number of partitions is an integer multiple of the old number, we can know which old partition the data in the new partition came from. For example, expanding a 3-partition HASH partitioned table to 6 partitions, we can determine the data source:



&lt;img src="https://lastdba.com/img/csdn/84a32ff4147c.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;Although understanding this simple data characteristic is helpful, in practice it may not be very useful, because new HASH partitions are always populated by brute-force INSERT. In terms of operations, going from &amp;ldquo;3→4&amp;rdquo; partitions is no different from &amp;ldquo;3→6&amp;rdquo;.
Mature data sync tools are now widely available. For example, using DTS to insert the table into a new table and then performing a table switch — this results in very short downtime and should be the preferred approach in production.
Below is primarily testing and observing the manual addition of integer-multiple partitions to a HASH partitioned table:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Partition info:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3377&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3354&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3369&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;pre&gt;&lt;code&gt;2. DETACH partitions:
 Adding 3 more partitions to a 3-partition HASH native partitioned table:
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p3;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;RENAME partitions:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p3;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Create 6 HASH partitions on the old table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p4 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p5 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p6 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;View partition info:
Note the function used in the partition constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; orders_p1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.orders_p1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-----------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (modulus &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, remainder &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Calculate which new partition old partition data should be inserted into.
For example, the old modulus 3, remainder 0 partition&amp;rsquo;s data needs to be split into the modulus 6, remainder 0 and remainder 3 partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1601&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3377&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Insert data directly into partition child tables:
You can insert data directly into the corresponding partition child tables rather than through the partition parent table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p2 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p3 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p4 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p5 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p6 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Verify data from 3 old partitions has been inserted into 6 new partitions:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1665&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p5 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1678&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p6 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1689&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1601&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1691&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Changing Column Length on Partitioned Tables Rebuilds Indexes
 &lt;div id="changing-column-length-on-partitioned-tables-rebuilds-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#changing-column-length-on-partitioned-tables-rebuilds-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Modifying a column involves three considerations: table rewrite, index rebuild, and statistics loss.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Changing column type or reducing column length rewrites the table.&lt;/li&gt;
&lt;li&gt;Increasing column length only causes statistics loss; an exception is reducing the length (or changing int4 to int8), which rewrites the table.&lt;/li&gt;
&lt;li&gt;Increasing column length does not rebuild indexes, with one exception: increasing column length on a partitioned table rebuilds indexes (if the column has an index).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For column modifications, refer to the PostgreSQL apprentice.&lt;/p&gt;
&lt;p&gt;Here we mainly test the scenario of &lt;em&gt;increasing column length on a partitioned table&lt;/em&gt;. If an index exists, it may cause transaction blocking on the partitioned table.
Regular table, increasing the length of an indexed column:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create regular table and index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t111(id int,name varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t111 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx111 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t111(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Index file relfilenode is 417728
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx111&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417728&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Increase column length
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t111 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Index file relfilenode is still 417728, unchanged. Regular table index was NOT rebuilt.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx111&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417728&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Partitioned table, increasing the length of an indexed column:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create an index on the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_name &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check the index on one partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_name_idx&amp;#34;&lt;/span&gt; btree (name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417810&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Increase the indexed column length — partitioned table index is rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417814&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reduce the indexed column length — partitioned table is rewritten
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;609&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;585&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417828&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417825&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Keep the indexed column length the same — partitioned table index is still rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417834&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417825&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For regular tables, increasing column length only requires attention to statistics loss (except int to bigint). However, for partitioned tables, when increasing column length, if the column has an index, not only are statistics lost but the index is also rebuilt. Since ALTER COLUMN is an 8-level lock, the index rebuild period causes extended blocking.
Recommendation: first drop the index, modify the column, then rebuild the index using the &amp;ldquo;parent table ONLY + child tables CIC + ATTACH&amp;rdquo; approach.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Partition Table Maintenance Summary
 &lt;div id="partition-table-maintenance-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-maintenance-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;PARTITION OF / DROP TABLE / DETACH require ACCESS EXCLUSIVE locks. ATTACH / DETACH CONCURRENTLY are recommended — they do not cause blocking. For DETACH CONCURRENTLY, watch for existing long-running transactions.&lt;/li&gt;
&lt;li&gt;Before ATTACH-ing a partition, you can pre-create a constraint on the partition. This eliminates the time spent scanning partition data during ATTACH.&lt;/li&gt;
&lt;li&gt;Currently, CIC (CREATE INDEX CONCURRENTLY) is not supported on partitioned tables. You can create partition indexes using the &amp;ldquo;ONLY on parent + CONCURRENTLY on children + ATTACH index&amp;rdquo; approach to reduce business blocking time.&lt;/li&gt;
&lt;li&gt;Partitioned tables do not support the USING INDEX method for creating primary keys.&lt;/li&gt;
&lt;li&gt;Pay attention to the exceptional case of modifying column length on partitioned tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partition Table Optimization
 &lt;div id="partition-table-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Partition Pruning
 &lt;div id="partition-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partition Pruning can improve performance for declarative partitioning and is a very important feature for partitioned table optimization. Without partition pruning, queries would scan all partitions. With partition pruning, the optimizer can filter out partitions that don&amp;rsquo;t need to be accessed through the WHERE condition.



&lt;img src="https://lastdba.com/img/csdn/574daf83f7c1.png" alt="Partition pruning" /&gt;
Partition pruning relies on the PARTITION CONSTRAINT (visible with \d+), which means &lt;strong&gt;queries must include partition key conditions&lt;/strong&gt; for pruning to occur. This constraint differs from regular CHECK constraints — it is automatically created when the partition is created.
Partition pruning is controlled by the &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter, which defaults to on.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without partition pruning, all partitions are accessed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partition_pruning&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With partition pruning enabled, partitions that don&amp;#39;t need to be accessed are excluded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partition_pruning&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(The official documentation says pruning happens during execution plan generation, and EXPLAIN would show &amp;ldquo;Subplans Removed.&amp;rdquo; In testing, this isn&amp;rsquo;t always the case, as in the EXPLAIN example above.)
&lt;strong&gt;Partition pruning can occur at two stages: during execution plan generation, and during actual execution.&lt;/strong&gt;
Why does this happen? Because sometimes only at execution time can we know which partitions can be pruned. There are two scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Simulating runtime pruning: When fetching data from another table, the optimizer certainly doesn&amp;rsquo;t know what the data is, so it cannot use that as a basis for partition pruning during plan generation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create another table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; x(date_created &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; x &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 09:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Generate execution plan only, don&amp;#39;t execute — no pruning occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; x (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute the SQL — pruning occurred. Notice the &amp;#34;never executed&amp;#34; keyword.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;680&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;682&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; x (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;013&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;014&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;676&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;652&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;45382&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (never executed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (never executed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Partition Wise Join
 &lt;div id="partition-wise-join" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-wise-join" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partition wise join can reduce the cost of partition joins.
Suppose there are two partitioned tables t1 and t2, both with 3 partitions (p1, p2, p3) with identical partition definitions. t1 has 10 rows per partition, t2 has 20 rows per partition:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;t1&lt;/th&gt;
 &lt;th&gt;t2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;p1&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;p2&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;p3&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;When t1 and t2 join,&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;Normally, all data from both partitioned tables needs to be extracted for joining. The number of row comparison operations would be:
(10+10+10)*(20+20+20)=180&lt;/li&gt;
&lt;li&gt;With partition wise join, since the structures are similar, only corresponding partitions need to be joined, e.g.:
t1.p1&amp;lt;=&amp;gt;t2.p1,
t1.p2&amp;lt;=&amp;gt;t2.p2,
t1.p3&amp;lt;=&amp;gt;t2.p3,
The number of row comparison operations becomes:
(10*20)*3=90&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When there are many partitions, the cost savings of partition wise join are significant.
Parameter &lt;code&gt;enable_partitionwise_join&lt;/code&gt;: whether to enable partition wise join, default is off.&lt;/p&gt;
&lt;p&gt;The prerequisites for partition wise join are very strict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The join condition must include the partition key.&lt;/li&gt;
&lt;li&gt;The partition keys must be of the same data type.&lt;/li&gt;
&lt;li&gt;Partitions must correspond one-to-one.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While these conditions seem strict, it&amp;rsquo;s relatively rare for tables with different purposes to produce partition wise join scenarios. A common case would be both tables using RANGE time partitioning. Another scenario: a partitioned table self-joining also meets partition wise join prerequisites:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without partition wise join enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; p1.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,p2.name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 p1,lzlpartition1 p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; p1.date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;p2.date_created &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; p2.name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;9256&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;182252&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2085&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;85364&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;427&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;427&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p2_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p2_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p2_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With partition wise join enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_join &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; p1.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,p2.name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 p1,lzlpartition1 p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; p1.date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;p2.date_created &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; p2.name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;287&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2529&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;438&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;287&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1338&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_1.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_1.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p2_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;250&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;202&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_2.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_2.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p2_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_3.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_3.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p2_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without partition wise join enabled, the optimizer first accesses all partition data from p2 (matching the filter) and combines them (Append), then Hash Joins with all partition data from p1 through the partition key.
With partition wise join enabled, the optimizer joins corresponding partitions from p1 and p2 (actually the same table accessed twice):
p1_1&amp;lt;=&amp;gt;p2_1 Hash Join
p1_2&amp;lt;=&amp;gt;p2_2 Hash Join
p1_3&amp;lt;=&amp;gt;p2_3 Hash Join
Then combines the data together (Append).
If there are enough data partitions, combined with partition pruning, partition wise join can have very good optimization effects.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Partition Wise Grouping/Aggregation
 &lt;div id="partition-wise-groupingaggregation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-wise-groupingaggregation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When performing aggregation on partitioned data, partitions can each compute independently — there is no need to scan all partition data for aggregation. Each partition computes its own aggregation, then the results are collected and returned.
Without partition wise grouping, it&amp;rsquo;s essentially &amp;ldquo;&lt;strong&gt;scan all partitions first, then aggregate&lt;/strong&gt;.&amp;rdquo; With partition wise grouping, it&amp;rsquo;s &amp;ldquo;&lt;strong&gt;aggregate per partition first, then combine results&lt;/strong&gt;.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Advantages of partition wise grouping:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When partitions are on foreign servers, the aggregation operator can be pushed down to the foreign server.&lt;/li&gt;
&lt;li&gt;When aggregating into hash tables, each partition rather than the entire table uses the memory hash table space, reducing memory usage.&lt;/li&gt;
&lt;li&gt;Aggregation algorithms pushed down to individual partitions can better utilize features like indexes and parallelism.&lt;/li&gt;
&lt;li&gt;Fewer data comparisons. Although data scanning is the same, there are fewer data comparisons — for example, data from the last partition does not need to be compared with data from the first partition.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Parameter &lt;code&gt;enable_partitionwise_aggregate&lt;/code&gt;: whether to enable partition wise grouping/aggregation, default is off.&lt;/p&gt;
&lt;p&gt;Partition wise aggregate example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;) lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without wise agg
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_aggregate &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created,&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(id),&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10354&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10562&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;89&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83180&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created, (&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(lzlpartition1.id)), (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2725&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3557&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83180&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2085&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;85364&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With wise agg enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_aggregate &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created,&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(id),&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10356&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10564&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83296&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created, (&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(lzlpartition1.id)), (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1219&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3548&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83296&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1219&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44387&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1061&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1448&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;38709&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_2.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without partition wise aggregate: first scan all data then combine (Append), then aggregate (HashAggregate).
With partition wise aggregate: first aggregate on each partition (HashAggregate), then combine results (Append).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Partial Aggregation&lt;/strong&gt;
The aggregation algorithm can be pushed down to partitions for computation. At this point, the aggregated results fall into two categories: non-duplicate aggregation data (GROUP BY includes the partition key), and duplicate aggregation data (GROUP BY does not include the partition key).
When aggregation data is non-duplicate, simply appending the per-partition computed aggregation data is sufficient (as in the example above). When per-partition aggregation data has duplicates, an additional aggregation step (Finalize Aggregate) is needed. Aggregation that does not include the partition key is partial aggregation.&lt;/p&gt;
&lt;p&gt;Partial aggregation example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When GROUP BY is not the partition key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; enable_partitionwise_aggregate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_partitionwise_aggregate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; id ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2474&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2573&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9900&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2377&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19467&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9652&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;962&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1059&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9615&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_2.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When GROUP BY does not include the partition key, aggregation can still be performed, but a subsequent Finalize HashAggregate is required.&lt;/p&gt;
&lt;p&gt;Even without GROUP BY, Partial Aggregate can still occur:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; enable_partitionwise_aggregate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_partitionwise_aggregate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(date_created) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The precondition for triggering Partial Aggregate is not GROUP BY. We should think from the purpose of Partial Aggregate — it aims to push aggregation down to partitions. Aggregation without GROUP BY can also be done this way, as shown in the two examples above: they both compute aggregation on each partition first (Partial Aggregate), then combine and aggregate once more (Finalize Aggregate). Without the parameter enabled, these aggregations would occur after scanning all partitions.&lt;/p&gt;

&lt;h2 class="relative group"&gt;History of Partitioned Tables
 &lt;div id="history-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#history-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Declarative partitioning has gone through many version enhancements and is now very mature. Here&amp;rsquo;s a summary of declarative partitioning feature enhancements across PostgreSQL versions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pre-PG9.6&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only inheritance tables could implement partitioning functionality.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG10&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Declarative partitioning supported.&lt;/li&gt;
&lt;li&gt;RANGE and LIST partitioning supported.&lt;/li&gt;
&lt;li&gt;ATTACH/DETACH table partitions supported.&lt;/li&gt;
&lt;li&gt;Partition pruning supported.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG11&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Added HASH partition support.&lt;/li&gt;
&lt;li&gt;Support for creating primary keys, foreign keys, indexes, and triggers.&lt;/li&gt;
&lt;li&gt;Support for updating partition key; automatic creation of indexes on partitions.&lt;/li&gt;
&lt;li&gt;Support for DEFAULT partition.&lt;/li&gt;
&lt;li&gt;Support for ATTACH index.&lt;/li&gt;
&lt;li&gt;Support for FOR EACH ROW triggers, automatically created on existing and future child partitions.&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter; pruning enhancements.&lt;/li&gt;
&lt;li&gt;Support for partition wise join.&lt;/li&gt;
&lt;li&gt;Support for partition wise aggregation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG12&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced query, insert, pruning, and COPY performance.&lt;/li&gt;
&lt;li&gt;Support for foreign key constraints referencing partitioned tables.&lt;/li&gt;
&lt;li&gt;Support for non-blocking partition ATTACH: &lt;code&gt;ALTER TABLE ATTACH PARTITION&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG13&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced pruning.&lt;/li&gt;
&lt;li&gt;Enhanced partition wise join.&lt;/li&gt;
&lt;li&gt;Support for BEFORE triggers.&lt;/li&gt;
&lt;li&gt;Support for publishing partitioned tables; support for subscribing and writing to partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG14&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced UPDATE and DELETE performance.&lt;/li&gt;
&lt;li&gt;Support for non-blocking partition DETACH: &lt;code&gt;ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Support for REINDEX on partitioned table indexes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG15&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced execution plan generation, reducing generation time with many partitions.&lt;/li&gt;
&lt;li&gt;Enhanced sorting.&lt;/li&gt;
&lt;li&gt;Support for CLUSTER on partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG16&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced GENERATED column restrictions: if the parent table has a generated column, child partitions must also include it.&lt;/li&gt;
&lt;li&gt;Enhanced lookup for RANGE and LIST partitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;《PostgreSQL修炼之道》&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/NW8XOZNq0YlDZvx24H737Q" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/NW8XOZNq0YlDZvx24H737Q&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/ddl-partitioning.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/ddl-partitioning.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/ddl-inherit.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/ddl-inherit.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/13/sql-altertable.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/sql-altertable.html&lt;/a&gt;
&lt;a href="https://github.com/postgrespro/pg_pathman" target="_blank" rel="noreferrer"&gt;https://github.com/postgrespro/pg_pathman&lt;/a&gt;
&lt;a href="https://developer.aliyun.com/article/62314" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/62314&lt;/a&gt;
&lt;a href="https://hevodata.com/learn/postgresql-partitions" target="_blank" rel="noreferrer"&gt;https://hevodata.com/learn/postgresql-partitions&lt;/a&gt;
&lt;a href="https://www.postgresql.fastware.com/postgresql-insider-prt-ove" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/postgresql-insider-prt-ove&lt;/a&gt;
&lt;a href="https://www.buckenhofer.com/2021/01/postgresql-partitioning-guide/" target="_blank" rel="noreferrer"&gt;https://www.buckenhofer.com/2021/01/postgresql-partitioning-guide/&lt;/a&gt;
&lt;a href="https://www.depesz.com/2018/05/01/waiting-for-postgresql-11-support-partition-pruning-at-execution-time/" target="_blank" rel="noreferrer"&gt;https://www.depesz.com/2018/05/01/waiting-for-postgresql-11-support-partition-pruning-at-execution-time/&lt;/a&gt;
&lt;a href="https://blog.csdn.net/horses/article/details/86164273" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/horses/article/details/86164273&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.pgsql.tech/article_0_10000102" target="_blank" rel="noreferrer"&gt;http://www.pgsql.tech/article_0_10000102&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/fragments/postgres-partitioning-2022" target="_blank" rel="noreferrer"&gt;https://brandur.org/fragments/postgres-partitioning-2022&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Vector Database Core Concepts</title><link>https://lastdba.com/en/2024/08/12/vector-database-core-concepts/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/vector-database-core-concepts/</guid><description>&lt;h2 class="relative group"&gt;Vector Database Core Concepts
 &lt;div id="vector-database-core-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-core-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;A Bit of History
 &lt;div id="a-bit-of-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-bit-of-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The development history of LLM models, from &lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6913a42c261b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Many people only gradually learned about large models after the ChatGPT explosion, but in the years before that tipping point, the development of large models had already begun a war of the gods. Several institutions published many revolutionary papers — on the corporate side: Google, DeepMind, OpenAI, Meta, Microsoft; on the academic side: Stanford, Berkeley, CMU, Princeton, MIT&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Vector Database Core Concepts
 &lt;div id="vector-database-core-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-core-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;A Bit of History
 &lt;div id="a-bit-of-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-bit-of-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The development history of LLM models, from &lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6913a42c261b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Many people only gradually learned about large models after the ChatGPT explosion, but in the years before that tipping point, the development of large models had already begun a war of the gods. Several institutions published many revolutionary papers — on the corporate side: Google, DeepMind, OpenAI, Meta, Microsoft; on the academic side: Stanford, Berkeley, CMU, Princeton, MIT&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;There are three main camps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Google &amp;amp; DeepMind camp — Gemini, Bard&lt;/li&gt;
&lt;li&gt;Microsoft &amp;amp; OpenAI camp — ChatGPT, Bing&lt;/li&gt;
&lt;li&gt;Meta open-source community camp — Llama&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Timeline of recent large model product releases, from &lt;a href="https://arxiv.org/pdf/2303.18223.pdf" target="_blank" rel="noreferrer"&gt;A Survey of Large Language Models&lt;/a&gt;&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/517ff3855241.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Generative AI Basics
 &lt;div id="generative-ai-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#generative-ai-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;AIGC (Artificial Intelligence Generated Content)&lt;/strong&gt;: The precise concept of AIGC is a mode of production that uses AI to automatically generate content. In a broader sense, AIGC can be approximated as AI technology trained to possess human-like generative and creative capabilities — i.e., Generative AI. It can autonomously generate and create new text, images, music, videos, 3D interactive content, and various other forms of content and data based on data and generative algorithm models, and even includes enabling new scientific discoveries and creating new meanings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLM (Large Language Model)&lt;/strong&gt;: LLMs are large language models capable of capturing and processing complex language patterns and semantics — that is, they can understand and generate human language. GPT-3, ChatGPT, BERT, T5, ERNIE Bot, and others are typical large language models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NLP (Natural Language Processing)&lt;/strong&gt;: Natural Language Processing (NLP) studies how to enable computers to read and understand human language — i.e., converting natural human language into instructions that computers can process. LLM is an important component of NLP.&lt;/p&gt;
&lt;p&gt;AIGC has achieved remarkable growth, largely due to Natural Language Processing (NLP), and the biggest driver behind NLP&amp;rsquo;s progress is the Large Language Model (LLM). This year (2024), AIGC is also developing rapidly in areas such as video and audio.&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;prompt&lt;/strong&gt;: Instructions or directives — natural language provided to AI describing a task, used to guide a language model (such as GPT-3 or GPT-4) to generate the corresponding output&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;. (Everyone basically knows what this is already, no need to elaborate.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;embedding&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Embedding is a method of representing objects (such as text, images, and audio) as points in a continuous vector space, where the positions of these points in space carry semantic meaning for machine learning algorithms.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3449199f0a3f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Based on &lt;a href="https://nlp.stanford.edu/projects/glove/" target="_blank" rel="noreferrer"&gt;GloVe&lt;/a&gt; word-vector relevance for English words, there is an &lt;a href="https://blog.echen.me/embedding-explorer/#/" target="_blank" rel="noreferrer"&gt;interactive 2D embedding explorer&lt;/a&gt;. This shows natural language embedded as 2D vectors:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/97b468b62314.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;RAG
 &lt;div id="rag" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rag" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;RAG (Retrieval-Augmented Generation) is a two-stage process consisting of document retrieval and large language model (LLM) answer generation. The initial stage leverages dense embeddings to retrieve documents. Depending on the specific use case, this retrieval can be based on various database formats, such as vector databases, summary indexes, tree indexes, and key indexes&lt;sup id="fnref1:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6eb500130d41.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/2005.11401" target="_blank" rel="noreferrer"&gt;original RAG paper&lt;/a&gt;&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt; was published on May 22, 2020, by researchers from Facebook (Meta), University College London, and New York University, proposing a general fine-tuning approach for RAG. RAG includes the following characteristics&lt;sup id="fnref1:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RAG models combine pre-trained memory to assist language generation&lt;/li&gt;
&lt;li&gt;RAG models generate language that is more specific, diverse, and factual&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ed7b3a3ae81.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;On March 23, 2023, OpenAI released the &lt;a href="https://github.com/openai/chatgpt-retrieval-plugin" target="_blank" rel="noreferrer"&gt;chatgpt-retrieval-plugin&lt;/a&gt; repository, recommending the use of vector databases in RAG. From that point on, vector databases gained widespread attention in the application domain, riding the wave of large model popularity.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b58b99f55a52.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;What Can Vector Databases Bring to AI?
 &lt;div id="what-can-vector-databases-bring-to-ai" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-can-vector-databases-bring-to-ai" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Vector databases can provide large models with data retrieval and long-term data storage capabilities within RAG&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f35b1bc4881b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Why use RAG? No words carry more weight than those of the master, OpenAI. The following passage is from the retrieval plugin usage guide released by OpenAI in March 2023&lt;sup id="fnref:8"&gt;&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref"&gt;8&lt;/a&gt;&lt;/sup&gt;, translated by ChatGPT:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The open-source retrieval plugin enables ChatGPT to access personal or organizational information sources (with permission). Users can ask questions or express needs in natural language and obtain the most relevant document snippets from their data sources (such as files, notes, emails, or public documents).&lt;/p&gt;
&lt;p&gt;As an open-source and self-hosted solution, developers can deploy their own version of the plugin and register it with ChatGPT. The plugin leverages OpenAI&amp;rsquo;s embeddings and allows developers to choose a vector database (such as Milvus, Pinecone, Qdrant, Redis, Weaviate, or Zilliz) to index and search documents. Information sources can be synchronized with the database using webhooks.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In short, OpenAI recommends everyone use vector databases.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/0eBZ4zyX6XjBQO0GqlANnw" target="_blank" rel="noreferrer"&gt;Has the vector database cooled off?&lt;/a&gt; Not only has it not cooled off — RAG has developed to the point of being everywhere today — &lt;a href="https://mp.weixin.qq.com/s/awIInAtPOkZz_s4jg9TO_w" target="_blank" rel="noreferrer"&gt;Has RAG Technology Really Become &amp;ldquo;Commonplace&amp;rdquo;?&lt;/a&gt;. And vector databases, with their high retrieval efficiency, data storage reliability, and other characteristics, are an important part of RAG.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Common Vector Databases
 &lt;div id="common-vector-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#common-vector-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since OpenAI released the RAG repo, many vector databases have emerged (though some existed before). Several companies have also secured considerable funding&lt;sup id="fnref:9"&gt;&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref"&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Company&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Headquartered in&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Funding&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Weaviate&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇳🇱 Amsterdam&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$68M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Qdrant&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇩🇪 Berlin&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$11M Seed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Pinecone&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$138M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Milvus/Zilliz&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇨🇳 / 🇺🇸 Redwood City&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$113M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Chroma&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$20M Seed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;LanceDB&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Venture&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Vespa&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇳🇴 / 🇺🇸 Indianapolis&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Yahoo!&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Vald&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇯🇵 Tokyo&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Yahoo! Japan&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Vector database release timeline:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/87c1f32c95b1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/erikbern/ann-benchmarks" target="_blank" rel="noreferrer"&gt;Vector database performance comparison&lt;/a&gt;&lt;sup id="fnref:10"&gt;&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref"&gt;10&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5d6c1d0ba8c2.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Dedicated vector databases generally perform better than traditional databases with vector plugins, for roughly two reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dedicated vector databases are built with vector-specific underlying storage, and their performance is generally better than untargeted traditional databases.&lt;/li&gt;
&lt;li&gt;Dedicated vector databases are generally newer (mostly implemented in Go or Rust), making code-level optimization easier.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, this does not mean plugin-based vector databases have no place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Traditional databases natively support more features, not just similarity computation.&lt;/li&gt;
&lt;li&gt;ACID — traditional database storage is safer.&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s easier to manipulate data within a single database.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Vector database feature comparison:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c1f5f45fa343.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The description of &lt;a href="https://github.com/pgvector/pgvector" target="_blank" rel="noreferrer"&gt;pgvector&lt;/a&gt; above is no longer entirely accurate — pgvector now supports HNSW, and the pgvector ecosystem project &lt;a href="https://github.com/timescale/pgvectorscale" target="_blank" rel="noreferrer"&gt;pgvectorscale&lt;/a&gt; also supports DiskANN.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Mathematical Concepts
 &lt;div id="mathematical-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mathematical-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Mathematics says: &amp;ldquo;I stand on the mountaintop watching you all play.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Scalar&lt;/strong&gt;
 &lt;div id="scalar" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#scalar" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A scalar is a specific number. Scalars have no direction and are generally defined in contrast to vectors.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Vector&lt;/strong&gt;
 &lt;div id="vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In Euclidean space, a vector has both magnitude and direction. For example, vector &lt;strong&gt;a&lt;/strong&gt; from point &lt;em&gt;A&lt;/em&gt; to point &lt;em&gt;B&lt;/em&gt; (contains information about both points and direction)&lt;sup id="fnref:11"&gt;&lt;a href="#fn:11" class="footnote-ref" role="doc-noteref"&gt;11&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fa984d43877f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Unit Vector&lt;/strong&gt;
 &lt;div id="unit-vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#unit-vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A vector with magnitude one is a unit vector. The unit vector equals the vector divided by its Euclidean length&lt;sup id="fnref:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;:
$$
\vec a = \frac{\mathbf a}{||\mathbf a||}
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e9563146f301.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In mathematics, the &lt;a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank" rel="noreferrer"&gt;Unit Vector&lt;/a&gt; is called a &amp;ldquo;normalized vector&amp;rdquo; in pgvector and OpenAI embeddings. (Note: do not confuse this with the mathematical concept of the &lt;a href="https://en.wikipedia.org/wiki/Normal_%28geometry%29" target="_blank" rel="noreferrer"&gt;normal vector&lt;/a&gt; — a normal vector is a different concept entirely.)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Why use unit vectors?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;OpenAI embeddings&amp;rsquo; explanation for using unit vectors&lt;sup id="fnref:13"&gt;&lt;a href="#fn:13" class="footnote-ref" role="doc-noteref"&gt;13&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;OpenAI embeddings are normalized to length 1, which means that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cosine similarity can be computed slightly faster using just a dot product&lt;/li&gt;
&lt;li&gt;Cosine similarity and Euclidean distance will result in the identical rankings&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;&lt;strong&gt;Sparse Vector&lt;/strong&gt;
 &lt;div id="sparse-vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sparse-vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Sparse vectors are called &amp;ldquo;sparse&amp;rdquo; because the information in the vector is sparsely distributed. Typically, we need to find a few ones (relevant information) among thousands of zeros. Therefore, these vectors can contain many dimensions, usually in the tens of thousands.&lt;/p&gt;
&lt;p&gt;Comparison of sparse and dense vectors: Sparse vectors contain sparsely distributed bits of information, while dense vectors carry more information in every dimension — information-dense.&lt;sup id="fnref:14"&gt;&lt;a href="#fn:14" class="footnote-ref" role="doc-noteref"&gt;14&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d7500917874.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Euclidean Space&lt;/strong&gt;
 &lt;div id="euclidean-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#euclidean-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Simply called Euclidean space, it is the most fundamental space in mathematics. In modern mathematics, a space of positive integer n dimensions is called Euclidean space.&lt;/p&gt;
&lt;p&gt;There are other space definitions, such as inner product space and Hilbert space. They differ in mathematical definitions, but in database/real-world contexts, the distinctions are not so fine-grained. The key takeaway is that inner product space, Euclidean space, and Hilbert space can all contain elements such as points, vectors, and inner products — we can simply call them &amp;ldquo;&lt;strong&gt;multi-dimensional spaces&lt;/strong&gt;&amp;rdquo;. For their differences, see &lt;a href="https://zhuanlan.zhihu.com/p/684643954" target="_blank" rel="noreferrer"&gt;A Casual Discussion of Various Spaces in Mathematics&lt;/a&gt;&lt;sup id="fnref:15"&gt;&lt;a href="#fn:15" class="footnote-ref" role="doc-noteref"&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a22c1c460ef1.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Euclidean Distance&lt;/strong&gt;
 &lt;div id="euclidean-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#euclidean-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Simply called Euclidean distance, this is what we generally think of as the distance between points — i.e., the length of a line segment&lt;sup id="fnref:16"&gt;&lt;a href="#fn:16" class="footnote-ref" role="doc-noteref"&gt;16&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/15196bf76dd3.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In 2D space, the Euclidean distance between points q and p is:
$$
d(\mathbf p,\mathbf q)=\sqrt{(p_1-q_1)^2+(p_2-q_2)^2}
$$&lt;/p&gt;
&lt;p&gt;In n-dimensional space, the Euclidean distance between points q and p is:
$$
d(\mathbf p,\mathbf q)=\sqrt{(p_1-q_1)^2+(p_2-q_2)^2+\cdots+(p_n-q_n)^2}
$$&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Manhattan Distance (or Taxicab Distance)&lt;/strong&gt;
 &lt;div id="manhattan-distance-or-taxicab-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manhattan-distance-or-taxicab-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;$$
d(\mathbf p,\mathbf q)= \sum_{i=1}^n | p_i-q_i|
$$&lt;/p&gt;
&lt;p&gt;Manhattan distance is the sum of the absolute differences of two points across each dimension&lt;sup id="fnref:17"&gt;&lt;a href="#fn:17" class="footnote-ref" role="doc-noteref"&gt;17&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9d81381e5fb5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In the figure above, the green line is Euclidean distance; the red, yellow, and blue lines are Manhattan distances.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Minkowski Distance&lt;/strong&gt;
 &lt;div id="minkowski-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#minkowski-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;$$
d(\mathbf a,\mathbf b)= \left( \sum_{i=1}^n | a_i-b_i|^p \right)^{1/p}
$$&lt;/p&gt;
&lt;p&gt;The figure below shows the distance from the origin to a point of unit length at different values of p in Minkowski distance&lt;sup id="fnref:18"&gt;&lt;a href="#fn:18" class="footnote-ref" role="doc-noteref"&gt;18&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/020d3a11e478.png" alt="image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When p=1, it is Manhattan distance, also written as &amp;ldquo;L1 distance&amp;rdquo;&lt;/li&gt;
&lt;li&gt;When p=2, it is Euclidean distance, also written as &amp;ldquo;L2 distance&amp;rdquo;&lt;/li&gt;
&lt;li&gt;When p=n, it is Minkowski distance, also written as &amp;ldquo;Ln distance&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Cosine Similarity&lt;/strong&gt;
 &lt;div id="cosine-similarity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cosine-similarity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The cosine value of the angle between two vectors — also called cosine similarity. Cosine similarity depends only on the angle between the two vectors, not on the vectors&amp;rsquo; lengths&lt;sup id="fnref:19"&gt;&lt;a href="#fn:19" class="footnote-ref" role="doc-noteref"&gt;19&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7c215f476c3e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The smaller the angle between two vectors, the larger the cosine similarity. Value range: [-1, 1]. cos(0)=1, cos(90)=0, cos(180)=-1.&lt;/p&gt;
&lt;p&gt;Cosine similarity between two vectors is written as:
$$
cos (\theta)
$$
Expressed in vector form:
$$
cos (\theta)=\frac{\mathbf a\cdot \mathbf b }{||\mathbf a|| , ||\mathbf b||}= \frac{ \sum_{i=1}^n \mathbf a_i \mathbf b_i}{ \sqrt {\sum_{i=1}^n \mathbf a_i ^2} \cdot \sqrt {\sum_{i=1}^n \mathbf b_i ^2}}
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c561d1e0ee46.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Inner Product&lt;/strong&gt;
 &lt;div id="inner-product" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inner-product" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Also called the dot product, it can be used to represent the length and angle of vectors. The inner product equals the &lt;em&gt;Euclidean distance&lt;/em&gt; of the vectors multiplied by the &lt;em&gt;cosine of the angle between them&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Inner product in 2D space:
$$
\mathbf a\cdot \mathbf b=||\mathbf a|| , ||\mathbf b||, cos \theta
$$
or
$$
\mathbf a\cdot \mathbf b= a_1 b_1 + a_2 b_2
$$
Inner product in n-dimensional space (&lt;strong&gt;a&lt;/strong&gt;=[a1,a2,···,an], &lt;strong&gt;b&lt;/strong&gt;=[b1,b2,···,bn]):
$$
\mathbf a\cdot \mathbf b=\sum_{i=1}^n a_ib_i= a_1b_1 + a_2b_2 + \cdots + a_nb_n
$$&lt;/p&gt;
&lt;p&gt;Now the following diagram should make sense. Using the formulas above, you can also reverse-engineer what the distance operators mean for n-dimensional vectors.&lt;/p&gt;
&lt;p&gt;They are: Euclidean distance, cosine distance, and inner product&lt;sup id="fnref:20"&gt;&lt;a href="#fn:20" class="footnote-ref" role="doc-noteref"&gt;20&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b206ce5a7c9.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;All three can describe the similarity between two vectors.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Euclidean distance: contains only distance information between the two vectors&lt;/li&gt;
&lt;li&gt;Cosine distance: contains only angle information between the two vectors&lt;/li&gt;
&lt;li&gt;Inner product: contains both distance information and angle information&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, there are more mathematical models for vector similarity computation, but it depends on whether the vector database supports them.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Jaccard Distance&lt;/strong&gt;
 &lt;div id="jaccard-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#jaccard-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In short: intersection divided by union&lt;sup id="fnref:21"&gt;&lt;a href="#fn:21" class="footnote-ref" role="doc-noteref"&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e5680f0330ab.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Formula:
$$
J(A,B)= \frac{|A\cap B| }{|A \cup B|}
$$&lt;/p&gt;
&lt;p&gt;Expressed in vectors, it computes the ratio of the count of equal elements to the count of unequal elements&lt;sup id="fnref:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ee60ca0304e3.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Hamming Distance&lt;/strong&gt;
 &lt;div id="hamming-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hamming-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The number of differing positions between two strings or vectors of equal length&lt;sup id="fnref:23"&gt;&lt;a href="#fn:23" class="footnote-ref" role="doc-noteref"&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;ka&lt;strong&gt;rol&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;ka&lt;strong&gt;thr&lt;/strong&gt;in&amp;rdquo; is 3.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;k&lt;strong&gt;a&lt;/strong&gt;r&lt;strong&gt;ol&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;k&lt;strong&gt;e&lt;/strong&gt;r&lt;strong&gt;st&lt;/strong&gt;in&amp;rdquo; is 3.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;k&lt;strong&gt;athr&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;k&lt;strong&gt;erst&lt;/strong&gt;in&amp;rdquo; is 4.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;0000&lt;/strong&gt; and &lt;strong&gt;1111&lt;/strong&gt; is 4.&lt;/li&gt;
&lt;li&gt;2&lt;strong&gt;17&lt;/strong&gt;3&lt;strong&gt;8&lt;/strong&gt;96 and 2&lt;strong&gt;23&lt;/strong&gt;3&lt;strong&gt;7&lt;/strong&gt;96 is 3.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Illustration&lt;sup id="fnref:24"&gt;&lt;a href="#fn:24" class="footnote-ref" role="doc-noteref"&gt;24&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2df7e926cb52.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Delaunay Triangulation&lt;/strong&gt;
 &lt;div id="delaunay-triangulation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delaunay-triangulation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Delaunay triangulation is an operation on a set of points in a plane. It subdivides the convex hull of these points (which contains multiple points) into multiple triangles, where the circumcircle of each triangle contains no point from the set. This maximizes the minimum angle among all triangles and tends to avoid producing skinny triangles&lt;sup id="fnref:25"&gt;&lt;a href="#fn:25" class="footnote-ref" role="doc-noteref"&gt;25&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Does NOT satisfy &amp;ldquo;the circumcircle of each triangle contains no point from the set&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/61a1cd01f71f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;DOES satisfy &amp;ldquo;the circumcircle of each triangle contains no point from the set&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/dfbb3c28e6e3.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;For example, triangulating a point set:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b5d7da74a98a.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;A valid triangulation:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b13e838ade76.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;Delaunay triangulation is not actually an algorithm — it merely defines what a &amp;ldquo;good&amp;rdquo; triangular mesh looks like. Its excellent properties are the empty-circle property and the maximized-minimum-angle property. These two properties avoid the creation of skinny triangles and make Delaunay triangulation widely applicable.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Voronoi Diagram&lt;/strong&gt;
 &lt;div id="voronoi-diagram" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#voronoi-diagram" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Delaunay triangulation is a triangulation of a discrete point set P in general position, and it corresponds to the dual graph of P&amp;rsquo;s Voronoi diagram. The circumcenters of Delaunay triangles are the vertices of the Voronoi diagram. In 2D, Voronoi vertices are connected by edges, which can be derived from the adjacency relationships of Delaunay triangles: if two triangles share an edge in the Delaunay triangulation, their circumcenters should be connected by an edge in the Voronoi tessellation&lt;sup id="fnref:26"&gt;&lt;a href="#fn:26" class="footnote-ref" role="doc-noteref"&gt;26&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ea403a88c609.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The key property of a Voronoi diagram is: &lt;em&gt;the distance from a centroid to any point within its region is smaller than the distance from that point to any other centroid&lt;/em&gt;.
$$
R_k={x \in X ,|,d(x,P_k) \le d(x,P_j) ; \mathrm{for ,all },j \neq k}
$$
Rk is the centroid, d(x,Pk) is the distance from the centroid to any point within its region, and d(x,Pj) is the distance from other centroids to any point in that region.&lt;/p&gt;
&lt;p&gt;Due to different ways of computing the distance d, Voronoi diagrams can take on different appearances&lt;sup id="fnref:27"&gt;&lt;a href="#fn:27" class="footnote-ref" role="doc-noteref"&gt;27&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d7f64ffda7c7.png" alt="image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Vector Database Indexes
 &lt;div id="vector-database-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Nearest Neighbor Search
 &lt;div id="nearest-neighbor-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#nearest-neighbor-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;ENN (Exact Nearest Neighbor)&lt;/strong&gt;: Finding the point or vector closest to a query point in a given dataset. This method guarantees the highest accuracy, but as the dataset size increases, the computational cost rises sharply because it requires evaluating the distance between the query point and every point in the dataset.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ANN (Approximate Nearest Neighbor)&lt;/strong&gt;: To improve efficiency, approximately finding the nearest point to the query point at the cost of some accuracy. This method is implemented through various algorithms and can significantly reduce computational cost, especially effective when dealing with large-scale datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KNN (K-Nearest Neighbors)&lt;/strong&gt;: A commonly used machine learning algorithm that works by finding the K nearest neighbors to a given query point in the dataset.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Evaluation Criteria
 &lt;div id="index-evaluation-criteria" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-evaluation-criteria" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Evaluating the quality of an index always depends on the specific data model, but in general, it includes the following points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Query time&lt;/strong&gt;: Query speed is critical, especially important in large model contexts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query quality&lt;/strong&gt;: ANN queries won&amp;rsquo;t always return perfectly accurate results, but the query quality must not deviate too much. Query quality has many metrics, the most common being recall.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory consumption&lt;/strong&gt;: The memory consumed by the query index — searching in memory is clearly faster than searching on disk.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training time&lt;/strong&gt;: Some search methods require training to reach a good state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Write time&lt;/strong&gt;: The impact on the index when writing vectors, including all maintenance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of these metrics are straightforward. Here we&amp;rsquo;ll focus on &lt;em&gt;query quality&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;In ANN search, results are not always exact. When searching a set of elements, the concepts include: the query scope (retrieved elements), all correct elements (relevant elements), the returned correct elements (true positives), and the returned incorrect elements (false positives)&lt;sup id="fnref:28"&gt;&lt;a href="#fn:28" class="footnote-ref" role="doc-noteref"&gt;28&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7712556dface.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;TP = True positive; FP = False positive; TN = True negative; FN = False negative&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;:
$$
Accuracy=\frac{TP+TN}{TP+FP+TN+FN}
$$
or:
$$
Accuracy=\frac{\text{all correct elements}}{\text{all elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt;:
$$
Precision=\frac{TP}{TP+FP}
$$
or:
$$
Precision=\frac{\text{retrieved correct elements}}{\text{all retrieved elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt;:
$$
Recall=\frac{TP}{TP+FN}
$$
or:
$$
Recall=\frac{\text{retrieved correct elements}}{\text{all correct elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;F-measure&lt;/strong&gt;: Equivalent to weighted precision and recall
$$
Recall=2 \cdot \frac{precision \cdot recall}{precision+recall}
$$&lt;/p&gt;
&lt;p&gt;Example: Consider a computer program designed to identify dogs (and related elements) in digital photos. When processing a photo containing ten cats and twelve dogs, the program identifies eight dogs. Among the eight identified as dogs, only five are actually dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). For this program:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accuracy = 12/(10+12) (largely independent of the identification program itself)&lt;/li&gt;
&lt;li&gt;Precision = 5/8 (true positives / all retrieved elements)&lt;/li&gt;
&lt;li&gt;Recall = 5/12 (true positives / all correct elements)&lt;/li&gt;
&lt;li&gt;F-measure = 2*[(5/18)*(5/12)]/[(5/18)+(5/12)]&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Locality-Sensitive Hashing (LSH)
 &lt;div id="locality-sensitive-hashing-lsh" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locality-sensitive-hashing-lsh" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;LSH is a method for narrowing the search scope by converting data vectors into hash values while preserving information about their similarity.&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Construction
 &lt;div id="lsh-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSH has many implementations. Here we introduce the more traditional one. This traditional LSH implementation consists of three parts&lt;sup id="fnref1:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Shingling&lt;/strong&gt;: Encode the original text into vectors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MinHashing&lt;/strong&gt;: Convert the vectors into a special representation called a signature, used for comparing similarity between them.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LSH function&lt;/strong&gt;: Hash the signatures into different buckets. If a pair of vectors&amp;rsquo; signatures fall into the same bucket at least once, they are considered candidates.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Shingling
 &lt;div id="shingling" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shingling" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Shingling is a method of embedding (in my personal opinion). Shingling identifies natural language as k consecutive tokens, with duplicate tokens removed&lt;sup id="fnref2:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/45d8beeaced2.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;At this point, we have a set of tokens based on k-grams. The next step is to convert them into vectors.&lt;/p&gt;
&lt;p&gt;Start with an all-zero vector, whose length equals the length of the token set. Set the position corresponding to each token to 1:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/288f0f1f9f2c.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The final result is a very long vector containing only 0s and 1s, where the vector&amp;rsquo;s information captures the semantics of a sentence.&lt;/p&gt;

&lt;h4 class="relative group"&gt;MinHashing
 &lt;div id="minhashing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#minhashing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Since the vector dimensionality is extremely high, directly computing approximate distances using one-hot encoded vectors yields very poor results. We need to convert sparse vectors into dense vectors — this process is called MinHashing in LSH, and the converted vector is called a MinHashing signature.&lt;/p&gt;
&lt;p&gt;MinHashing can be a bit tricky for beginners at first, but once you grasp it, you&amp;rsquo;ll find it very simple.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;MinHashing is a hash function that permutes the components of an input vector and then returns the first index where the permuted vector component equals 1.&lt;/p&gt;
&lt;/blockquote&gt;&lt;ol&gt;
&lt;li&gt;First, apply a permutation: rearrange the components of a vector.&lt;/li&gt;
&lt;li&gt;Return the index of the first element that equals 1 after permutation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;u1 vector (0,0,1,1,0): after the first random permutation, the corresponding index is 0; after the second random permutation, the corresponding index is 0&lt;sup id="fnref:29"&gt;&lt;a href="#fn:29" class="footnote-ref" role="doc-noteref"&gt;29&lt;/a&gt;&lt;/sup&gt;. u1&amp;rsquo;s MinHashing signature is (0,0).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c5997f710719.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In practice, multiple minhash values can be used to approximately compute the Jaccard similarity between vectors. The more minhash values used, the more accurate the approximation.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9c95377842ce.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Function
 &lt;div id="lsh-function" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-function" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Even after converting sparse vectors into dense vectors, the dense vectors can still have high dimensionality, making direct retrieval inefficient.&lt;/p&gt;
&lt;p&gt;We can improve query efficiency using hash tables. However, note that using a completely random hash algorithm easily places nearby vectors into different hash buckets. We need a hash algorithm that places nearby vectors into the &lt;em&gt;same&lt;/em&gt; hash bucket — this is LSH: Locality-Sensitive Hashing.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The LSH mechanism builds a hash table consisting of several parts which puts a pair of signatures into the same bucket if they have at least one corresponding part.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The concept of locality-sensitive hashing is also simple: split the signature into bands, compute hash values for each sub-signature band, and designate those with colliding sub-hash values as candidates.&lt;/p&gt;
&lt;p&gt;The following example is easy to understand — read through it:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f95a92163fc8.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Thinking in terms of extremes: b=1 means no banding at all — direct hashing, completely defeating the purpose of LSH. b=number of signature elements means one band per element, i.e., one hash value per element — this can achieve relatively accurate approximate comparison, but it imposes a massive burden on computation and memory.&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Parameters and Error Rate
 &lt;div id="lsh-parameters-and-error-rate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-parameters-and-error-rate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The probability that a vector becomes a candidate vector directly affects recall. The probability of a candidate vector is as follows, where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;s represents similarity&lt;/li&gt;
&lt;li&gt;b represents the number of bands&lt;/li&gt;
&lt;li&gt;r represents the number of rows per band&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/11d2d0ff3d63.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;If we plot P against s using the formula, the relationship between vector similarity and candidate probability is as follows:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d9a7f7f7c269.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5ce91d839ace.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The larger the number of bands b, the smaller the candidate similarity probability.&lt;/p&gt;
&lt;p&gt;At the same time, adjusting b and s affects P, and P is related to FP and TN.&lt;/p&gt;
&lt;p&gt;For example, returning more candidates naturally leads to more false positives — i.e., returning non-similar &amp;ldquo;candidate pairs.&amp;rdquo; This is an inevitable consequence of modifying the parameter b.&lt;/p&gt;
&lt;p&gt;TP = True positive; FP = False positive; TN = True negative; FN = False negative&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/888c9b18576f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;LSH is susceptible to high-dimensional data: more dimensions require longer signatures and more computation to maintain good search quality. In such cases, other indexes are recommended.&lt;/p&gt;

&lt;h4 class="relative group"&gt;More
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;There are two more articles I haven&amp;rsquo;t finished digesting — they seem to be related to binary vectors and Euclidean distance:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47" target="_blank" rel="noreferrer"&gt;https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/similarity-search-part-7-lsh-compositions-1b2ae8239aca" target="_blank" rel="noreferrer"&gt;https://towardsdatascience.com/similarity-search-part-7-lsh-compositions-1b2ae8239aca&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;HNSW Index
 &lt;div id="hnsw-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The HNSW algorithm (Hierarchical Navigable Small World) is a multi-layer graph-based proximity algorithm. HNSW is currently one of the most popular vector index algorithms.&lt;/p&gt;
&lt;p&gt;At a high level, HNSW is based on the &lt;a href="https://en.wikipedia.org/wiki/Small-world_network" target="_blank" rel="noreferrer"&gt;Small World Theory&lt;/a&gt;. The Small World Theory originally stems from the &lt;a href="https://en.wikipedia.org/wiki/Six_degrees_of_separation" target="_blank" rel="noreferrer"&gt;Six Degrees of Separation&lt;/a&gt; theory in social psychology — any two people can be connected through at most five layers of social relationships. In other words, any two people on Earth can be connected through at most six steps of social connections. The Small World Theory was later widely accepted through experimental and empirical evidence and extended to non-social relationship networks. Note that the Small World Theory is a phenomenon.&lt;/p&gt;
&lt;p&gt;In short, the Small World Theory explains that &amp;ldquo;&lt;em&gt;the connection between two entities is actually very short&lt;/em&gt;.&amp;rdquo; What HNSW does is establish connections between elements and reduce the number of connections.&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Construction
 &lt;div id="hnsw-index-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Let&amp;rsquo;s look at the HNSW paper&amp;rsquo;s algorithm for constructing HNSW graph layers&lt;sup id="fnref:30"&gt;&lt;a href="#fn:30" class="footnote-ref" role="doc-noteref"&gt;30&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/da93d451c90f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Several elements in the construction algorithm are important:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;M&lt;/strong&gt; is the number of new edges (connections) added, representing the number of new edges for a newly inserted node.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mmax&lt;/strong&gt; is the maximum number of edges per node. If neighboring nodes are inserted continuously, the edge count of existing neighboring nodes could keep increasing, wasting computational resources during search. When inserting a new node causes an existing neighboring node&amp;rsquo;s edge count to exceed Mmax, shrink connection is needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;efConstruction&lt;/strong&gt; is the set of neighboring nodes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Construction illustration&lt;sup id="fnref:31"&gt;&lt;a href="#fn:31" class="footnote-ref" role="doc-noteref"&gt;31&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/79c887052aca.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Steps for HNSW node insertion (without shrink connection)&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When a new node is inserted, first find neighboring nodes at the top layer using &lt;em&gt;efConstruction&lt;/em&gt;. Use the found nearest neighbor as the entry point to descend to the next layer, then continue searching for neighbors using that layer&amp;rsquo;s &lt;em&gt;efConstruction&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Perform node insertion at a certain layer (e.g., L=2). Select M nodes from &lt;em&gt;efConstruction&lt;/em&gt; and connect them to the new node — at this point, 1 new node is added with M edges connected to it.&lt;/li&gt;
&lt;li&gt;Repeat step 2 until reaching the bottom layer (layer0).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;HNSW Heuristic Neighbor Selection
 &lt;div id="hnsw-heuristic-neighbor-selection" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-heuristic-neighbor-selection" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The basic HNSW index structure construction has another problem: if two clusters are relatively far apart, according to the basic HNSW construction algorithm, the two clusters are almost impossible to connect, because the basic HNSW construction algorithm is built on the nearest neighbor nodes in &lt;em&gt;efConstruction&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW original paper&lt;/a&gt; not only proposed the basic HNSW construction algorithm but also introduced a heuristic algorithm for solving the isolated cluster problem:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0ccf3de6e6ec.png" alt="image" /&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Fig.2 Heuristic for selecting graph neighbors for two isolated clusters. A new element is inserted on the boundary of cluster 1. All the element&amp;rsquo;s nearest neighbors belong to cluster 1, thus missing the Delaunay triangulation edges between the clusters. However, the heuristic selects element e2 from cluster 2, so if the inserted element is closer to e2 than to any other element from cluster 1, global connectivity is maintained.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;strong&gt;&amp;ldquo;The heuristic algorithm not only considers the nearest distance between nodes in the graph but also considers connectivity between different regions of the graph.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As shown below, when adding node X, the heuristic algorithm should be applied here — establishing connectivity with cluster A, rather than simply adding to the nearest neighbor nodes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7393a1117ad.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Search
 &lt;div id="hnsw-index-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The main logic of HNSW&amp;rsquo;s KNN search method as described in the &lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW original paper&lt;/a&gt; consists of the following two algorithms:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b4bbc841673b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7ec9e80d0b9.png" alt="image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Algorithm 2 appears slightly more complex, but the logic is actually simple — Algorithm 2 finds the set of nearest neighbor nodes &lt;strong&gt;ef&lt;/strong&gt; for &lt;strong&gt;q&lt;/strong&gt; at that layer. In simple terms, Algorithm 2 adds candidate nodes to the ef set, compares distances, and removes the farthest nodes, so the returned W is the ef for q at that layer.&lt;/li&gt;
&lt;li&gt;Algorithm 5 returns the K nearest neighbor nodes of q. It calls Algorithm 2 twice (or more). The first line in the for loop has input parameter ef=1, meaning non-bottom layers only find the single nearest ep (entry point). The bottom layer (lc=0) returns the K nearest neighbor node set W.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3154b27761ee.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Complexity
 &lt;div id="hnsw-complexity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-complexity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The number of HNSW layers is a function of log(N).&lt;/p&gt;
&lt;p&gt;Search complexity: Complexity can be rigorously evaluated in a Delaunay graph, with the average complexity being O(log(N)) (for non-Delaunay graphs, such as graphs with heuristic neighbor selection, the paper does not provide a specific complexity formula).&lt;/p&gt;
&lt;p&gt;Construction complexity: HNSW is constructed by iteratively inserting all elements, with average complexity O(N∙log(N)).&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Parameters
 &lt;div id="hnsw-index-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Generally, HNSW indexes for vector data have several adjustable parameters that affect index construction speed, recall, etc. Different databases may have slightly different parameters. Here we use pgvector&amp;rsquo;s HNSW parameters as an example:&lt;/p&gt;
&lt;p&gt;Index construction parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;m&lt;/strong&gt;: Maximum number of edges per vector, default 16. Equivalent to Mmax in the paper.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_construction&lt;/strong&gt;: Number of vectors in the neighbor list during index construction, default 64. Equivalent to ef_construction in the paper.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Index search parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;hnsw.ef_search&lt;/strong&gt;: Adjusts the number of vectors in the neighbor list during search (also equivalent to ef_construction in the paper). Must be greater than or equal to limit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Impact of adjusting ef_construction on creation time and recall during index construction&lt;sup id="fnref1:20"&gt;&lt;a href="#fn:20" class="footnote-ref" role="doc-noteref"&gt;20&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/44a379598309.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing ef_construction improves recall but extends index creation time. After ef_construction=256, index construction time increases noticeably but recall improvement is not obvious.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ebd3ed59025b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing m also improves recall and extends index creation time. After m=36, index construction time increases noticeably but recall improvement is not obvious.&lt;/p&gt;
&lt;p&gt;Similarly, increasing hnsw.ef_search improves recall at the cost of performance.&lt;/p&gt;

&lt;h3 class="relative group"&gt;IVFFlat Index
 &lt;div id="ivfflat-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;IVFFlat stands for Inverted File with Flat Compression. (What&amp;rsquo;s the relationship with &amp;ldquo;invert&amp;rdquo;? Do all indexes that can&amp;rsquo;t be categorized get called inverted?) The core concept of the IVFFlat index is based on the Voronoi diagram:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The key property of a Voronoi diagram is: &lt;em&gt;the distance from a centroid to any point within its region is smaller than the distance from that point to any other centroid&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;This property is expressed in formula form:
$$
R_k={x \in X ,|,d(x,P_k) \le d(x,P_j) ; \mathrm{for ,all },j \neq k}
$$
Rk is the centroid, d(x,Pk) is the distance from the centroid to any point within its region, and d(x,Pj) is the distance from other centroids to any point in that region.&lt;/p&gt;
&lt;p&gt;Using this concept, we can partition many vectors into regions by setting centroids, and then use the Voronoi diagram property to roughly find nearby points.&lt;/p&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Construction
 &lt;div id="ivfflat-index-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Let&amp;rsquo;s reduce high-dimensional space to 2D for understanding IVFFlat index construction&lt;sup id="fnref:32"&gt;&lt;a href="#fn:32" class="footnote-ref" role="doc-noteref"&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;For example, the following set of X marks represents points (or vectors). Suppose we have three centroids:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/370e9bbd8bac.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The three centroids partition 3 Voronoi cells, and all points are assigned to their respective Voronoi cells:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8e9e1aed5c5b.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Search
 &lt;div id="ivfflat-index-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Now there is a query node. Compute its distance to all centroids, find the nearest centroid, and the cell containing that centroid is the region to search next. Finally, within that region, find the neighboring nodes&lt;sup id="fnref:33"&gt;&lt;a href="#fn:33" class="footnote-ref" role="doc-noteref"&gt;33&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bad429ed41be.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Boundary Problem&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The above search path has a boundary problem. When the query is near a region boundary, if the true nearest node is in another region, the algorithm of &amp;ldquo;only searching for neighboring nodes within the region&amp;rdquo; will not find the true nearest neighbor.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9298bd73b504.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The boundary problem is fundamentally because:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Voronoi diagram only guarantees that the distance from a node to its own region&amp;rsquo;s centroid is smaller than the distance to other centroids, but it does NOT guarantee that the distance from a node to other nodes in its own region is smaller than the distance to nodes in other regions.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This problem can be mitigated by increasing the number of regions searched. For example, increasing the number of regions searched from 1 to 3:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e9493428d53.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing the number of search regions is generally set as a parameter in databases, such as &lt;code&gt;ivfflat.probes&lt;/code&gt; in pgvector.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IVFFlat Search Summary&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Compute the distance from the query node to all other centroids, find the nearest one.&lt;/li&gt;
&lt;li&gt;Based on the input parameter for the number of cells to query (e.g., probes), search for neighboring points in the top &lt;code&gt;probes&lt;/code&gt; cells.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Parameters
 &lt;div id="ivfflat-index-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Similarly, vector databases that support IVFFlat indexes generally have at least two parameters: &lt;code&gt;list&lt;/code&gt; and &lt;code&gt;probe&lt;/code&gt;. These parameters affect index search performance and recall. Here we use Faiss parameters as an example&lt;sup id="fnref1:32"&gt;&lt;a href="#fn:32" class="footnote-ref" role="doc-noteref"&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;nlist&lt;/strong&gt;: Number of regions to construct. Increasing nlist increases the time to search for the nearest centroid but reduces the time to search for nodes within a region.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;nprobe&lt;/strong&gt;: Number of regions to search. Increasing nprobe increases the number of regions searched, which obviously reduces search performance but improves recall.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Theoretically, for nlist, it&amp;rsquo;s best to test specifically against the structure of the vector data and the database type — increasing nlist does not always reduce response time. For nprobe, increasing nprobe definitely reduces search performance and improves recall, but making nprobe too large is meaningless and goes against the original intent of ANN.&lt;/p&gt;
&lt;p&gt;The following is from Pinecone&amp;rsquo;s performance testing of the Faiss IVFFlat index:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5ca487b356dd.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;PQ Product Quantization
 &lt;div id="pq-product-quantization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-product-quantization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;One million dense vectors may require gigabytes of memory, and real-world vectors far exceed this number. Without management, similarity vector search can require enormous amounts of memory — yet RAM is limited. Vector size increases with vector dimensionality and the number of vectors.&lt;/p&gt;
&lt;p&gt;Product Quantization (PQ) aims to reduce memory usage and can also improve query speed (because the amount of computation is reduced). PQ is a lossy compression method, which leads to reduced vector retrieval accuracy, but this is acceptable within ANN requirements.&lt;/p&gt;
&lt;p&gt;PQ&amp;rsquo;s algorithm logic is slightly more complex than other algorithms. I strongly recommend this article: &lt;a href="https://towardsdatascience.com/similarity-search-product-quantization-b2a1a6397701" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 2: Product Quantization&lt;/a&gt;&lt;sup id="fnref:34"&gt;&lt;a href="#fn:34" class="footnote-ref" role="doc-noteref"&gt;34&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Construction
 &lt;div id="pq-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/56b112821338.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Step description:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Subvectors&lt;/strong&gt; — Split the original high-dimensional vector into n low-dimensional sub-vectors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codebook&lt;/strong&gt; — Use the k-means algorithm (or other algorithms) to compute the Voronoi diagram for &lt;em&gt;each&lt;/em&gt; set of all sub-vectors, producing n different Voronoi diagrams. These Voronoi diagrams are the codebooks (assuming each Voronoi diagram has k centroids).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clustering&lt;/strong&gt; — Place the n sub-vectors into their respective already-clustered Voronoi diagrams and compute the nearest centroid.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quantized vectors&lt;/strong&gt; — Take these n nearest centroids as the new vector — the quantized vector.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reproduction values&lt;/strong&gt; — Take the &lt;em&gt;nearest centroid index&lt;/em&gt; for each of the n subspaces as new values; the combined new values are called the PQ code.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Step 5, reproduction values, in detail:&lt;/p&gt;
&lt;p&gt;Based on the n sub-vectors and the k centroids in each subspace, we obtain an n×k centroid matrix. Taking the index of the nearest centroid for each sub-vector gives the PQ code.&lt;/p&gt;
&lt;p&gt;(btw: to be rigorous, all element indices in the diagram below should start from 1, not 0.)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fc2938307d7e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The new PQ code is equivalent to a lossy-compressed new vector (reproduction value) of the original vector. New distance calculations can directly compute the L2 distance of the PQ codes.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Retrieval
 &lt;div id="pq-retrieval" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-retrieval" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Based on the PQ original paper&lt;sup id="fnref:35"&gt;&lt;a href="#fn:35" class="footnote-ref" role="doc-noteref"&gt;35&lt;/a&gt;&lt;/sup&gt;, there are two PQ retrieval modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Symmetric mode&lt;/strong&gt;: The distance between vector x and vector y is approximated by the distance between their centroids q(x) and q(y). In other words, the distance between two vectors can be approximated by the distance between their PQ codes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asymmetric mode&lt;/strong&gt;: The distance between vector x and vector y is approximated by the distance from x to the centroid q(y). In other words, the distance between two vectors can be computed using the original query vector value and the other vector&amp;rsquo;s PQ code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d3704a6f01b5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Clearly, the distance accuracy differs between the two modes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9697622aad6e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The figure above shows the distance accuracy between two vectors under different modes, with 8 subspaces and 256 centroids. It can be seen that the asymmetric mode has higher accuracy than the symmetric mode.&lt;/p&gt;
&lt;p&gt;When comparing distances between two vectors, the symmetric and asymmetric distance computation models are quite useful. However, in the scenario of finding PQ approximate vectors, there are some differences — especially the symmetric mode, where distortion can be quite severe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The symmetric mode&amp;rsquo;s query speed is very fast because the code table has already been computed and preserved during the PQ construction process. You only need to first compute the query vector x&amp;rsquo;s PQ code via the code table (minimal computation), then reverse-lookup the code table to get the corresponding sub-code-table — all vectors in this sub-code-table are approximate vectors at equal distance. This method requires extremely little computation — just a direct table lookup.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The symmetric mode&amp;rsquo;s distortion is relatively severe (the two figures above don&amp;rsquo;t fully capture it — imagine it as a Voronoi diagram where one cell contains multiple vectors, and you&amp;rsquo;ll realize how severe the symmetric distortion can be). The asymmetric mode can &lt;em&gt;slightly&lt;/em&gt; alleviate this problem. In asymmetric mode, first compute the PQ code of vector x, then similarly reverse-lookup the code table to get the corresponding sub-code-table, then compute distances between vector x and the vectors in this sub-code-table to obtain KNN. Its computational cost is n×km (n = number of subspaces, km ≈ total vector count / centroid count).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/25945e785366.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Asymmetric mode requires finding the centroid via the PQ code, then searching for KNN within the subspace where the centroid resides. The distance between the query vector x and an existing vector y is approximated by the distance between x and y&amp;rsquo;s centroid.&lt;/p&gt;
&lt;p&gt;PQ asymmetric retrieval&lt;sup id="fnref1:34"&gt;&lt;a href="#fn:34" class="footnote-ref" role="doc-noteref"&gt;34&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ba66cc8da1b9.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f08ffd4fc669.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Steps of PQ asymmetric retrieval:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Split the query vector into multiple sub-vectors.&lt;/li&gt;
&lt;li&gt;Compute the distance between sub-vectors and the centroid matrix.&lt;/li&gt;
&lt;li&gt;Take the nearest centroid in each subspace as the query vector&amp;rsquo;s PQ code.&lt;/li&gt;
&lt;li&gt;Compute the approximate distance using the query vector and the centroid corresponding to the PQ code. Distances can be computed independently in each subspace and then summed.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As mentioned earlier, asymmetric mode&amp;rsquo;s approximate distance computation is slightly better than symmetric mode, but in some scenarios, the asymmetric distance can still deviate significantly from the actual distance:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6213fe57a4fc.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;This is easier to understand from the figure above. Within the same cell, the distance between the farthest vector and the centroid can differ significantly from the distance between the closest vector and the centroid. Computing only the partial distance to the centroid cannot capture this difference.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Parameters and Their Impact
 &lt;div id="pq-parameters-and-their-impact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-parameters-and-their-impact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PQ has at least two parameters that significantly affect performance and memory: the number of subspaces m and the number of centroids per subspace k.&lt;/p&gt;
&lt;p&gt;Recall:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The product quantizer is parametrized by the number of subvectors m and the number of quantizers per subvector k*, producing a code of length m × log2 k&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;With m subspaces, each having k* centroids, the length (in bits) of a PQ code is&lt;sup id="fnref1:35"&gt;&lt;a href="#fn:35" class="footnote-ref" role="doc-noteref"&gt;35&lt;/a&gt;&lt;/sup&gt;:
$$
code ; length , (bits)=m \cdot \log_2 k^*
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c89aa21160f6.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The more subspaces m, the higher the recall; the longer the PQ code, the higher the recall. Longer PQ code essentially means more centroids. Note that the specific values here are based on the paper&amp;rsquo;s dataset.&lt;/p&gt;
&lt;p&gt;Memory and complexity:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/06d80385f539.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;k represents the number of cluster centroids, D represents dimension, m represents the number of subspaces. k* represents centroids within a subspace, D* represents dimensions within a subspace.&lt;/p&gt;
&lt;p&gt;For example, with k=2048, D=128, m=8, the complexity is as follows&lt;sup id="fnref:36"&gt;&lt;a href="#fn:36" class="footnote-ref" role="doc-noteref"&gt;36&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Operation&lt;/th&gt;
 &lt;th style="text-align: center"&gt;Memory and complexity&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;k-means&lt;/td&gt;
 &lt;td style="text-align: center"&gt;kD = 2048×128 = 262144&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;PQ&lt;/td&gt;
 &lt;td style="text-align: center"&gt;mk&lt;em&gt;D&lt;/em&gt; = (k^(1/m))×D = (2048^(1/8))×128 = 332&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It can be seen that PQ significantly reduces complexity during search.&lt;/p&gt;

&lt;h3 class="relative group"&gt;DiskANN &amp;amp; Vamana
 &lt;div id="diskann--vamana" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#diskann--vamana" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://suhasjs.github.io/files/diskann_neurips19.pdf" target="_blank" rel="noreferrer"&gt;DiskANN original paper&lt;/a&gt; Abstract&lt;sup id="fnref:37"&gt;&lt;a href="#fn:37" class="footnote-ref" role="doc-noteref"&gt;37&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;At the time (the paper was published in 2019), state-of-the-art ANN algorithms all relied on RAM for high recall and performance. This approach was not only expensive but also limited dataset size. DiskANN requires only 64GB RAM and an affordable SSD.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Vamana Construction
 &lt;div id="vamana-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vamana-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Vamana iteratively builds a directed graph, starting from a random graph where each node represents a data point in the vector space. Initially, the graph is highly connected — all nodes are connected to each other. The graph is then optimized using an objective function that aims to maximize connectivity between the closest nodes. This is achieved by pruning most random short-range edges while adding certain long-range edges that connect distant nodes (to accelerate graph traversal)&lt;sup id="fnref1:37"&gt;&lt;a href="#fn:37" class="footnote-ref" role="doc-noteref"&gt;37&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/29f57b420d43.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The figure shows 200 2D points after two iterations. The first iteration aggressively prunes edges but also removes long edges that reduce path length; when alpha is increased to relax the pruning condition, long edges are added back&lt;sup id="fnref:38"&gt;&lt;a href="#fn:38" class="footnote-ref" role="doc-noteref"&gt;38&lt;/a&gt;&lt;/sup&gt;. For the specific algorithm, refer to the paper — this is roughly the idea.&lt;/p&gt;

&lt;h4 class="relative group"&gt;The DiskANN Algorithm
 &lt;div id="the-diskann-algorithm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-diskann-algorithm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From the paper&amp;rsquo;s &amp;ldquo;The DiskANN Index Design&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The high-level idea is simple: run Vamana on a dataset P and store the resulting graph on an SSD. At search time, whenever Algorithm 1 requires the out-neighbors of a point p, we simply fetch this information from the SSD. However, note that just storing the vector data for a billion points in 100 dimensions would far exceed the RAM on a workstation! This raises two questions: how do we build a graph over a billion points, and how do we do distance comparisons between the query point and points in our candidate list at search time in Algorithm 1, if we cannot even store the vector data?&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Run Vamana on the vector set and store it on SSD. When the dataset is very large, two problems must be addressed:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;How to index such a large-scale dataset with limited memory resources?&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;k-means + Vamana stacking algorithm&lt;/strong&gt;: First, use k-means to partition the data into k clusters, then assign each point to the nearest i clusters. Usually, i=2 is sufficient. Build an in-memory Vamana index for each cluster, and finally merge the k Vamana indexes into one.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;em&gt;If the original data cannot be loaded into memory, how to compute distances during search?&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Use compressed vectors (e.g., PQ) and store the compressed vectors in main memory.&lt;/p&gt;
&lt;p&gt;If index data is stored on SSD, disk access count and disk read/write requests must be minimized to ensure low search latency; at the same time, lossy compression reduces recall. Therefore, the DiskANN paper proposes three optimization strategies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Beam Search&lt;/strong&gt;: Simply put, preload neighbor information. When searching for point p, if p&amp;rsquo;s neighbors are not in memory, they must be loaded from disk. Since the time required for a small number of random SSD accesses is roughly the same as the time for a single SSD sector access, the neighbor information of W unvisited points can be loaded in one batch. W should not be set too large or too small. Setting W too large wastes computational resources and SSD bandwidth, while setting it too small increases search latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Caching Frequently Visited Vertices&lt;/strong&gt;: Aims to reduce disk access count. Cache all points within C hops from the starting point in memory. The value of C is best set between 3 and 4.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Implicit Re-Ranking Using Full-Precision Vectors&lt;/strong&gt;: Since PQ is lossy compression, PQ-based distance algorithms only approximate the actual distance. To eliminate this discrepancy, we store the distance from each point to all its neighbors — this is full-precision. As for the implementation principle, in simple terms, it also leverages disk loading efficiency.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Based on the paper, DiskANN&amp;rsquo;s execution efficiency and recall outperform IVF and HNSW:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e293f1c74241.png" alt="image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;Original article (Chinese): &lt;a href="https://lastdba.com/2024/08/12/%E5%90%91%E9%87%8F%E6%95%B0%E6%8D%AE%E5%BA%93%EF%BC%9A%E4%BB%8E0%E5%88%B0original-paper/" target="_blank" rel="noreferrer"&gt;向量数据库相关概念&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Chih-Hao Liu &lt;a href="https://tomohiroliu22.medium.com/66%E5%80%8B%E5%A4%A7%E5%9E%8B%E8%AA%9E%E8%A8%80%E6%A8%A1%E5%9E%8Bllm%E7%B6%93%E5%85%B8%E8%AB%96%E6%96%87-0fcdab74e822" target="_blank" rel="noreferrer"&gt;66 Classic LLM Papers&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2303.18223.pdf" target="_blank" rel="noreferrer"&gt;A Survey of Large Language Models&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://juejin.cn/post/7346233811212386345" target="_blank" rel="noreferrer"&gt;一文讲清楚，AI、AGI、AIGC与AIGC、NLP、LLM，ChatGPT等概念&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Prompt_engineering" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Prompt_engineering&lt;/a&gt;&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2005.11401" target="_blank" rel="noreferrer"&gt;RAG original paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;Jonathan Katz pgconfdev2024 &lt;a href="https://www.pgevents.ca/events/pgconfdev2024/sessions/session/1/slides/42/pgconfdev-2024-vectors.pdf" target="_blank" rel="noreferrer"&gt;Vectors: How to better support a nasty data type&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:8"&gt;
&lt;p&gt;OpenAI recommends using vector databases &lt;a href="https://openai.com/index/chatgpt-plugins/" target="_blank" rel="noreferrer"&gt;https://openai.com/index/chatgpt-plugins/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:9"&gt;
&lt;p&gt;&lt;a href="https://thedataquarry.com/posts/vector-db-1/" target="_blank" rel="noreferrer"&gt;Vector databases (1): What makes each one different?&lt;/a&gt;&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:10"&gt;
&lt;p&gt;&lt;a href="https://github.com/erikbern/ann-benchmarks" target="_blank" rel="noreferrer"&gt;Vector database performance comparison&lt;/a&gt;&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:11"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Vector_%28mathematics_and_physics%29" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:11" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:12"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Unit_vector&lt;/a&gt;&amp;#160;&lt;a href="#fnref:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:13"&gt;
&lt;p&gt;OpenAI on unit vector usage &lt;a href="https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions" target="_blank" rel="noreferrer"&gt;https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions&lt;/a&gt;&amp;#160;&lt;a href="#fnref:13" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:14"&gt;
&lt;p&gt;Pinecone Natural Language Processing for Semantic Search &lt;a href="https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:14" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:15"&gt;
&lt;p&gt;Yao Yuan &lt;a href="https://zhuanlan.zhihu.com/p/684643954" target="_blank" rel="noreferrer"&gt;A Casual Discussion of Various Spaces in Mathematics&lt;/a&gt;&amp;#160;&lt;a href="#fnref:15" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:16"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Euclidean_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Euclidean_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:16" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:17"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Taxicab_geometry" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Taxicab_geometry&lt;/a&gt;&amp;#160;&lt;a href="#fnref:17" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:18"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Minkowski_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Minkowski_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:18" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:19"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Sine_and_cosine" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Sine_and_cosine&lt;/a&gt;&amp;#160;&lt;a href="#fnref:19" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:20"&gt;
&lt;p&gt;Jonathan Katz pgconfeu2023 &lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4592/slides/435/pgconfeu2023_vectors.pdf" target="_blank" rel="noreferrer"&gt;Vectors are the new JSON&lt;/a&gt;&amp;#160;&lt;a href="#fnref:20" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:20" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:21"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Jaccard_index" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Jaccard_index&lt;/a&gt;&amp;#160;&lt;a href="#fnref:21" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:22"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-5-locality-sensitive-hashing-lsh-76ae4b388203" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 5: Locality Sensitive Hashing (LSH)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref2:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:23"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Hamming_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Hamming_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:23" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:24"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 6: Random Projections with LSH Forest&lt;/a&gt; ↩&amp;#160;&lt;a href="#fnref:24" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:25"&gt;
&lt;p&gt;earthwjl &lt;a href="https://www.jianshu.com/p/172749e6116a" target="_blank" rel="noreferrer"&gt;Delaunay Triangulation Study Notes&lt;/a&gt;&amp;#160;&lt;a href="#fnref:25" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:26"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Delaunay_triangulation" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Delaunay_triangulation&lt;/a&gt;&amp;#160;&lt;a href="#fnref:26" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:27"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Voronoi_diagram" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Voronoi_diagram&lt;/a&gt;&amp;#160;&lt;a href="#fnref:27" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:28"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Precision_and_recall" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Precision_and_recall&lt;/a&gt;&amp;#160;&lt;a href="#fnref:28" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:29"&gt;
&lt;p&gt;Jianshu &lt;a href="https://www.jianshu.com/p/d4368c8f40cb" target="_blank" rel="noreferrer"&gt;LSH (Locality Sensitive Hashing) Algorithm&lt;/a&gt;&amp;#160;&lt;a href="#fnref:29" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:30"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:30" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:31"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-4-hierarchical-navigable-small-world-hnsw-2aad4fe87d37" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:31" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:32"&gt;
&lt;p&gt;&lt;a href="https://www.pinecone.io/learn/series/faiss/vector-indexes/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/faiss/vector-indexes/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:32" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:32" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:33"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-knn-inverted-file-index-7cab80cc0e79" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 1: kNN &amp;amp; Inverted File Index&lt;/a&gt;&amp;#160;&lt;a href="#fnref:33" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:34"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-product-quantization-b2a1a6397701" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 2: Product Quantization&lt;/a&gt;&amp;#160;&lt;a href="#fnref:34" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:34" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:35"&gt;
&lt;p&gt;&lt;a href="https://inria.hal.science/file/index/docid/514462/filename/paper_hal.pdf" target="_blank" rel="noreferrer"&gt;PQ Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:35" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:35" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:36"&gt;
&lt;p&gt;Pinecone Faiss Manual &lt;a href="https://www.pinecone.io/learn/series/faiss/product-quantization/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/faiss/product-quantization/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:36" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:37"&gt;
&lt;p&gt;&lt;a href="https://suhasjs.github.io/files/diskann_neurips19.pdf" target="_blank" rel="noreferrer"&gt;DiskANN Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:37" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:37" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:38"&gt;
&lt;p&gt;DiskANN, A Disk-based ANNS Solution with High Recall and High QPS on Billion-scale Dataset &lt;a href="https://milvus.io/blog/2021-09-24-diskann.md" target="_blank" rel="noreferrer"&gt;https://milvus.io/blog/2021-09-24-diskann.md&lt;/a&gt;&amp;#160;&lt;a href="#fnref:38" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item></channel></rss>